Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prometheus monitoring #986

Closed
nickray opened this issue Dec 18, 2017 · 10 comments
Closed

Prometheus monitoring #986

nickray opened this issue Dec 18, 2017 · 10 comments
Assignees
Labels
C:rpc Component: JSON RPC, gRPC
Milestone

Comments

@nickray
Copy link

nickray commented Dec 18, 2017

I would like to see a Prometheus monitoring endpoint available (basic example). The use case is for running a Cosmos Validator, where escrowed value is at risk of improper operations.

Performance considerations permitting (compile time option?), I assume integrating the client library, exposing the endpoint, and adding basic metrics (such as latest_block which are currently available at the 46657/status endpoint) should be fairly straight forward. From there, the actual metrics could be added on a pull request basis, or Tendermint users could patch in their own (possibly experimental) metrics.

The old tm-monitor repository implies that this has been considered before. What was the outcome? In my opinion, Tendermint should expose the information necessary to monitor its operation and state changes, but not be (officially) providing the tools to monitor them, as project clients may have varying requirements. Moreover, if there are multiple possibilities of implementing a solution, a (semi)standard should be chosen instead of a homegrown solution.

With the Christmas holiday coming up, I could possibly contribute pull requests. I have professional experience with C++ and Python, however Go and this codebase are new to me.

Looking forward to the discussion!

@melekes
Copy link
Contributor

melekes commented Dec 20, 2017

It would be great for Tendermint to report metrics. Would you be interested in writing an input plugin for https://github.com/influxdata/telegraf? It will probably require some work on the TM side too (API).

P.S. tm-monitor now lives here: https://github.com/tendermint/tools/tree/master/tm-monitor

@melekes
Copy link
Contributor

melekes commented Dec 20, 2017

Other option is to use https://github.com/go-kit/kit/tree/master/metrics.

go-kit does require to use one of the adapters:

var dur metrics.Histogram = prometheus.NewSummaryFrom(stdprometheus.SummaryOpts{
        Namespace: "myservice",
        Subsystem: "api",
        Name:     "request_duration_seconds",
        Help:     "Total time spent serving requests.",
    }, []string{})

not sure we want to mirror these adapter with settings in TM (given it may be faster to send directly opposite to using http client)

I am open to other solutions as well.

@ebuchman
Copy link
Contributor

Do we need to change anything in Tendermint to support this ?

@ebuchman ebuchman added the C:rpc Component: JSON RPC, gRPC label Feb 19, 2018
@greg-szabo greg-szabo added this to the post-launch milestone Feb 19, 2018
@nickray
Copy link
Author

nickray commented Feb 23, 2018

I apologize for not having sent any pull requests regarding this, and probably will not. It's an architectural decision on your side I think.

As supporting evidence, that go-kit/metrics itself states "It's primarily designed to help you get started with good and robust instrumentation, and to help you migrate from a less-capable system like Graphite to a more-capable system like Prometheus. If your organization has already standardized on an instrumentation system like Prometheus, and has no plans to change, it may make sense to use that system's instrumentation library directly."

What would have to changed AFAIK is

  • adding an endpoint for scraping the metrics (configurable whether/where it's exposed)
  • linking in the https://github.com/prometheus/client_golang library
  • starting to expose relevant metrics (this can be minimal at first, and ideally someone running a node can easily extend those metrics by hacking the code)

I kind of disagree with the "post-launch" target, cause I wouldn't want to see the launch fail due to people not being able to monitor their validators properly :)

@ebuchman ebuchman modified the milestones: post-launch, launch Feb 23, 2018
@xla
Copy link
Contributor

xla commented Mar 6, 2018

Do we need to change anything in Tendermint to support this ?

While integrating the prometheus client library and exposing the corresponding endpoint for Prometheus to scrape is relatively straight forward I assume that the gathering of the metrics themselves will be the challenge.

@ebuchman @melekes Would a reasonable approach be to create a new service which hooks into the nodes eventbus to listen to significant events and observes timings and other information through that?

@t-bast
Copy link

t-bast commented Mar 20, 2018

Note that something that could be considered as well for more powerful monitoring would be integrating a tracing system. It's a bit more opinionated though, but I think it has a lot of value for HA production systems.
I'd love to see an integration with OpenCensus (or an OpenTracing implementation like Jaeger). Even if at first it simply propagates the context in ABCI queries (without adding its own events), that would already be a very big win.
That would be an addition to Prometheus metrics, but the metrics could be derived from traces automatically (that's what both OpenCensus and Jaeger do).
I'm trying to setup OpenCensus tracing on our platform and the fact that Tendermint sits in the middle and doesn't support it makes the traces a lot less useful.
It requires quite a good architectural knowledge of the Tendermint codebase to add though, so it will probably be quite tough to do as an external contribution :(

@xla
Copy link
Contributor

xla commented Mar 20, 2018

@t-bast While both are in the realm of observability they are not interchangable. Could you open a new issue where we can discuss the merrits and feasibilities of tracing for tendermint?

@t-bast
Copy link

t-bast commented Mar 20, 2018

Thanks, done: #1342

@melekes melekes self-assigned this May 30, 2018
@melekes
Copy link
Contributor

melekes commented Jun 7, 2018

Don't think we want to commit to any specific instrument like Prometheus. Our users should be free to choose whenever they want - statsd, prometheus, etc. I like go-kit metrics package. It supports multiple backends and provides clean API (https://github.com/go-kit/kit/tree/master/metrics). opencensus looks cool. It comes with a benefit of tracing, which may be implemented later at some point. But it's API looks a bit weird with views (https://opencensus.io/go/index.html).

@t-bast
Copy link

t-bast commented Jun 7, 2018

As long as it's possible to build and plug into Tendermint an exporter to convert the metrics to the format of our choice (be it Prometheus, OpenCensus or anything else) we'd be happy ;)

melekes added a commit that referenced this issue Jun 8, 2018
melekes added a commit that referenced this issue Jun 11, 2018
melekes added a commit that referenced this issue Jun 11, 2018
@melekes melekes mentioned this issue Jun 12, 2018
melekes added a commit that referenced this issue Jun 13, 2018
melekes added a commit that referenced this issue Jun 16, 2018
melekes added a commit that referenced this issue Jun 19, 2018
melekes added a commit that referenced this issue Jun 20, 2018
@melekes melekes closed this as completed Jun 24, 2018
firelizzard18 pushed a commit to AccumulateNetwork/tendermint that referenced this issue Feb 1, 2024
Closes tendermint#889

Remove one implementation of `deterministicExecTxResult` by deleting one copy and rendering the other visible outside its package.

---

#### PR checklist

- [ ] Tests written/updated
- [ ] Changelog entry added in `.changelog` (we use [unclog](https://github.com/informalsystems/unclog) to manage our changelog)
- [ ] Updated relevant documentation (`docs/` or `spec/`) and code comments
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C:rpc Component: JSON RPC, gRPC
Projects
None yet
Development

No branches or pull requests

7 participants