Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Propose RRDD Protocol v3. #278

Merged
merged 1 commit into from
Apr 23, 2024
Merged

Conversation

TSnake41
Copy link
Contributor

Propose a new plugin protocol for OpenMetrics support and that is meant to be simpler to implement for plugins and parsers.
It uses OpenMetrics binary format (based on Protocol Buffers) that can be easily quite easily parsed and generated using reference or third-party protocol buffers implementations :

It could be interesting to consider implementations for various programming languages in the future, as protocol buffers is quite well supported and protocol format is pretty simple.


rrdd plugins protocol v2 report datasources via shared-memory file, however it
has various limitations :
- metrics are unique by their names, thus it is not possible cannot have
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't the current data sources go around this by attaching the VM UUID to the name?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the current protocol has these weaknesses in addition:

  • not properly versioned; the header should have a fixed structure and include versioning information such that the protocol can be evolved
  • The shared memory is allocated ahead of time, no matter how much space is actually needed, and can't be adjusted. It thus has to plan for the worst case.
  • Combination of a binary format with an ASCII format - thus not really taking advantage of either: readability and cross platform vs. compactness.

One of the better ideas is probably that it relies on a checksum; the buffer can be updated concurrently but this is detected by the checksum. which could be written atomically. However, I am not sure the checksum is properly aligned to guarantee atomic update.

All in all, I think the current protocol's design is weak and I'm happy to see a proposal for something better.

has various limitations :
- metrics are unique by their names, thus it is not possible cannot have
several metrics that shares a same name (e.g vCPU usage per vm)
- only number metrics are supported, for example we can't expose string
Copy link
Member

@psafont psafont Jul 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the utility of having cpu models exposed in the metrics interface? this isn't supposed to change

From the openmetrics specs:

Values

Metric values in OpenMetrics MUST be either floating points or integers. Note that ingestors of the format MAY only support float64. The non-real values NaN, +Inf and -Inf MUST be supported. NaN MUST NOT be considered a missing value, but it MAY be used to signal a division by zero.

Are you sure strings must be supported for values?

In any case I think some more metadata might need to be transported as tags /label for each metric. This should be supported and what's important is that the data can be aggregated, tagged correctly and exposed with sensible metadata using the openmetrics format and the rrds for backwards compatibility

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The definition of "Values" in the specs probably correspond to number/scalar values, which is distinct from Metrics objects that can be strings for example.

The Info metric type is meant to transport text data that should not change, which can include CPU Model along other things.

[OpenMetrics](https://openmetrics.io/) support for the metrics daemon.

Moreover, it may not be practical for plugin developpers and parser implementations :
- json implementations may not keep insersion order on maps, which can cause
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JSON arrays could be used instead, which are ordered lists

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could also enforce an order on the keys in a serialization.

Moreover, it may not be practical for plugin developpers and parser implementations :
- json implementations may not keep insersion order on maps, which can cause
issues to expose datasource values as it is sensitive to the order of the metadata map
- header length is not constant and depends on datasource count, which complicates parsing
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with this, type should be encoded in the data somehow, not in the header.

@lindig
Copy link
Contributor

lindig commented Jul 24, 2023

I am not sure the payload format is sufficiently defined. Is the OpenMetrics not versioned and sufficiently large such that we should say more which version is expected or supported?

| data checksum | 32 | uint32 | Checksum of the concatenation of the rest of the header (from timestamp) and the payload data
| timestamp | 64 | uint64 | Unix epoch
| payload length | 32 | uint32 | Payload length
| payload data | 8*(payload length) | binary | OpenMetrics encoded metrics data (protocol-buffers format)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The format is a container format. It could be helpful to add a field that announces the format of the payload, in particular if we want to permit different versions in the payload. Otherwise we would have to encode this in the header string - which is still possible. If the OpemMetrics header is sufficiently identifying its version and format, it might not be needed.

Copy link
Contributor Author

@TSnake41 TSnake41 Jul 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OpenMetrics protobuf format doesn't provide versioning.
It is possible to add a version field.

The only problem is on how to encode the version, we can still use a version table that maps a version "integer" into a OpenMetrics version. The implementation will have to use this field to check for compatibility. However, it is unclear for implementations how to choose what version to target (plugins should always use latest one available ?)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"plugins should always use latest one available" is not a sustainable position and any client reading the data should be able to rely on version information to know if it is compatible. Otherwise a client would have to parse the data and on failure decide that it is incompatible. So I suggest to make a client's life easy by announcing formats or versions ahead of the data if the data by itself has no header that makes this easy.

Copy link
Contributor Author

@TSnake41 TSnake41 Jul 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like a new RPC function for plugin registering using protocol v3 (like Plugin.Local.RegisterV3 ?) that has a field "versions" with a list of OpenMetrics versions (the plugin supports), and then the server answer with either "no version supported" or with the OpenMetrics version that must be used.

Signed-off-by: Teddy Astie <teddy.astie@outlook.fr>
ydirson pushed a commit to xcp-ng/xcp-metrics that referenced this pull request Aug 30, 2023
xcp-metrics exposes the same kind of metrics that xcp-rrdd does, but
using the OpenMetrics standard.

It uses an reworked version of the v2 protocol currently in use by xcp-rrdd,
proposed as v3 protocol in xapi-project/xapi-project.github.io#278

Signed-off-by: Teddy Astie <teddy.astie@outlook.fr>
Reviewed-by: Yann Dirson <yann.dirson@vates.fr>
ydirson pushed a commit to xcp-ng/xcp-metrics that referenced this pull request Aug 30, 2023
xcp-metrics exposes the same kind of metrics that xcp-rrdd does, but
using the OpenMetrics standard.

It uses an reworked version of the v2 protocol currently in use by xcp-rrdd,
proposed as v3 protocol in xapi-project/xapi-project.github.io#278

Signed-off-by: Teddy Astie <teddy.astie@outlook.fr>
Reviewed-by: Yann Dirson <yann.dirson@vates.fr>
ydirson pushed a commit to xcp-ng/xcp-metrics that referenced this pull request Sep 29, 2023
xcp-metrics exposes the same kind of metrics that xcp-rrdd does, but
using the OpenMetrics standard.

It uses an reworked version of the v2 protocol currently in use by xcp-rrdd,
proposed as v3 protocol in xapi-project/xapi-project.github.io#278

Signed-off-by: Teddy Astie <teddy.astie@outlook.fr>
Reviewed-by: Yann Dirson <yann.dirson@vates.fr>
ydirson pushed a commit to xcp-ng/xcp-metrics that referenced this pull request Oct 2, 2023
xcp-metrics exposes the same kind of metrics that xcp-rrdd does, but
using the OpenMetrics standard.

It uses an reworked version of the v2 protocol currently in use by xcp-rrdd,
proposed as v3 protocol in xapi-project/xapi-project.github.io#278

Signed-off-by: Teddy Astie <teddy.astie@outlook.fr>
Reviewed-by: Yann Dirson <yann.dirson@vates.fr>
ydirson pushed a commit to xcp-ng/xcp-metrics that referenced this pull request Oct 2, 2023
xcp-metrics exposes the same kind of metrics that xcp-rrdd does, but
using the OpenMetrics standard.

It uses an reworked version of the v2 protocol currently in use by xcp-rrdd,
proposed as v3 protocol in xapi-project/xapi-project.github.io#278

Signed-off-by: Teddy Astie <teddy.astie@outlook.fr>
Reviewed-by: Yann Dirson <yann.dirson@vates.fr>
ydirson pushed a commit to xcp-ng/xcp-metrics that referenced this pull request Oct 2, 2023
xcp-metrics exposes the same kind of metrics that xcp-rrdd does, but
using the OpenMetrics standard.

It uses an reworked version of the v2 protocol currently in use by xcp-rrdd,
proposed as v3 protocol in xapi-project/xapi-project.github.io#278

Signed-off-by: Teddy Astie <teddy.astie@outlook.fr>
Reviewed-by: Yann Dirson <yann.dirson@vates.fr>
Copy link
Member

@psafont psafont left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Xenserver engineers have reviewed this new protocol: while it enhances the protocol compared to V2, it doesn't have some properties that are needed for Xenserver:

  • It keeps using the existing memory-mapped files from V2. We've found this to have some big issues. 1. The file can't be resized once it's created. 2. It's difficult to create correct clients with certain languages, even making the metrics daemon to crash.
  • It's difficult to retrofit to existing plugins: it forces them to use protobuf, sometimes that's just not trivial.
  • Ability to generate rrd from the metadata in openmetrics format needs to be possible, this is not clear currently.
  • timestamps are Unix timestamp: they do not increase monotonically, system-dependent monotonic timestamps should be used (time from last boot)

I don't think these are blockers for the protocol itself, but we're exploring alternatives, mainly a socket-based protocol, along with new methods in the IDL (to register and deregister); use the plain-text format to reduce complexity in the clients; and finally couple the new protocol with a registering / deregistering mechanism in the IDL so clients can use it to declare what protocol version are they exposing and how, so the metrics daemon doesn't need to guess, nor need one registration mechanism per protocol.

@robhoes
Copy link
Member

robhoes commented Apr 23, 2024

Let's merge this proposal, as it has merit on its own, keeping in mind that we are exploring further improvements.

@robhoes robhoes merged commit 37f4da7 into xapi-project:master Apr 23, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants