New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prometheus monitoring #986
Comments
It would be great for Tendermint to report metrics. Would you be interested in writing an input plugin for https://github.com/influxdata/telegraf? It will probably require some work on the TM side too (API). P.S. tm-monitor now lives here: https://github.com/tendermint/tools/tree/master/tm-monitor |
Other option is to use https://github.com/go-kit/kit/tree/master/metrics. go-kit does require to use one of the adapters:
not sure we want to mirror these adapter with settings in TM (given it may be faster to send directly opposite to using http client) I am open to other solutions as well. |
Do we need to change anything in Tendermint to support this ? |
I apologize for not having sent any pull requests regarding this, and probably will not. It's an architectural decision on your side I think. As supporting evidence, that go-kit/metrics itself states "It's primarily designed to help you get started with good and robust instrumentation, and to help you migrate from a less-capable system like Graphite to a more-capable system like Prometheus. If your organization has already standardized on an instrumentation system like Prometheus, and has no plans to change, it may make sense to use that system's instrumentation library directly." What would have to changed AFAIK is
I kind of disagree with the "post-launch" target, cause I wouldn't want to see the launch fail due to people not being able to monitor their validators properly :) |
While integrating the prometheus client library and exposing the corresponding endpoint for Prometheus to scrape is relatively straight forward I assume that the gathering of the metrics themselves will be the challenge. @ebuchman @melekes Would a reasonable approach be to create a new service which hooks into the nodes eventbus to listen to significant events and observes timings and other information through that? |
Note that something that could be considered as well for more powerful monitoring would be integrating a tracing system. It's a bit more opinionated though, but I think it has a lot of value for HA production systems. |
@t-bast While both are in the realm of observability they are not interchangable. Could you open a new issue where we can discuss the merrits and feasibilities of tracing for tendermint? |
Thanks, done: #1342 |
Don't think we want to commit to any specific instrument like Prometheus. Our users should be free to choose whenever they want - statsd, prometheus, etc. I like go-kit metrics package. It supports multiple backends and provides clean API (https://github.com/go-kit/kit/tree/master/metrics). opencensus looks cool. It comes with a benefit of tracing, which may be implemented later at some point. But it's API looks a bit weird with views (https://opencensus.io/go/index.html). |
As long as it's possible to build and plug into Tendermint an exporter to convert the metrics to the format of our choice (be it Prometheus, OpenCensus or anything else) we'd be happy ;) |
Closes tendermint#889 Remove one implementation of `deterministicExecTxResult` by deleting one copy and rendering the other visible outside its package. --- #### PR checklist - [ ] Tests written/updated - [ ] Changelog entry added in `.changelog` (we use [unclog](https://github.com/informalsystems/unclog) to manage our changelog) - [ ] Updated relevant documentation (`docs/` or `spec/`) and code comments
I would like to see a Prometheus monitoring endpoint available (basic example). The use case is for running a Cosmos Validator, where escrowed value is at risk of improper operations.
Performance considerations permitting (compile time option?), I assume integrating the client library, exposing the endpoint, and adding basic metrics (such as
latest_block
which are currently available at the46657/status
endpoint) should be fairly straight forward. From there, the actual metrics could be added on a pull request basis, or Tendermint users could patch in their own (possibly experimental) metrics.The old tm-monitor repository implies that this has been considered before. What was the outcome? In my opinion, Tendermint should expose the information necessary to monitor its operation and state changes, but not be (officially) providing the tools to monitor them, as project clients may have varying requirements. Moreover, if there are multiple possibilities of implementing a solution, a (semi)standard should be chosen instead of a homegrown solution.
With the Christmas holiday coming up, I could possibly contribute pull requests. I have professional experience with C++ and Python, however Go and this codebase are new to me.
Looking forward to the discussion!
The text was updated successfully, but these errors were encountered: