Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow proxying cluster metrics from all components through YT proxies #549

Open
achulkov2 opened this issue Apr 24, 2024 · 1 comment
Open
Assignees
Labels
assigned enhancement New feature or request

Comments

@achulkov2
Copy link
Collaborator

achulkov2 commented Apr 24, 2024

Motivation

We want to expose an API for fetching metrics for the whole cluster solely from YT proxy hosts.

This is useful when the cluster provider does not want to expose monitoring ports for all components to an external client, constricting all interaction with YT to HTTP/RPC proxies and UI.
In the future, it could also allow collecting metrics from CHYT and other services run inside operations (potentially even user operations) without implementing a standalone endpoint discovery mechanism for finding the monitoring endpoints of the corresponding jobs.

Technical details

I will outline a very loose plan below.

Each http proxy will support a new route/command (without authorization), something along the lines of /fetch_metrics?shard_index=i&shard_count=n where i < n, which will return a portion of the metrics in the format specified by headers. The format will be specified in the same way as it would have been specified when pulling metrics from a monitoring endpoint (e.g. -H "Accept: application/x-solomon-spack" -H "Accept-Encoding: zstd").
This can be used to form a sharded list of targets my-cluster-proxy-balancer.yt/api/v4/fetch_metrics?shard_index=0&shard_count=20, my-cluster-proxy-balancer.yt/api/v4/fetch_metrics?shard_index=1&shard_count=20, ..., my-cluster-proxy-balancer.yt/api/v4/fetch_metrics?shard_index=19&shard_count=20.
It should also accept a list of components as a parameter.

The implementation will consist of an enriched TSolomonExporter (or a new extension of this class) that will be configured by a number of endpoint providers (similar to DiscoverVersions). It will need to handle the new sharding parameters by utilizing some form of consistent hashing over endpoints.
We can implement something similar to TSolomonExporter::AttachRemoteProcess, but instead of working through a separate DumpSensors call we will collect metrics similar to how an end-user would collect them, by calling the existing solomon handle with some format specifiers. The results can then be parsed and joined using encoders/decoders from monlib.
The point of using the existing exporter is to avoid duplicating the existing output format selection and encoding (~this).

The endpoint providers will be configurable on the server side, but maybe it won't be too hard to make this configurable dynamically and make it simple to add/remove components that can be discovered through Cypress.

@achulkov2 achulkov2 added enhancement New feature or request assigned labels Apr 24, 2024
@achulkov2 achulkov2 self-assigned this Apr 24, 2024
@achulkov2
Copy link
Collaborator Author

TODO: Think about whether we want this command in RPC proxies.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
assigned enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant