Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define & document metric naming rules #3600

Closed
binarylogic opened this issue Aug 27, 2020 · 7 comments · Fixed by #3653
Closed

Define & document metric naming rules #3600

binarylogic opened this issue Aug 27, 2020 · 7 comments · Fixed by #3653
Assignees
Labels
domain: external docs Anything related to Vector's external, public documentation domain: metrics Anything related to Vector's metrics events type: task Generic non-code related tasks

Comments

@binarylogic
Copy link
Contributor

binarylogic commented Aug 27, 2020

Similar to #3247, we should document how we name and tag metrics. Once #3519 is approved, we should materialize the rules from there (#3519 (comment)). It would be nice to include these in our public docs since it does have user facing meaning.

@binarylogic binarylogic added type: task Generic non-code related tasks domain: external docs Anything related to Vector's external, public documentation domain: metrics Anything related to Vector's metrics events labels Aug 27, 2020
@jszwedko
Copy link
Member

I'm wondering if we shouldn't apply these naming rules to our internal metrics as well. Currently we do not appear to be.

@binarylogic
Copy link
Contributor Author

Was thinking the same thing. We should.

@binarylogic binarylogic changed the title Document metric naming rules Define & document metric naming rules Aug 29, 2020
@jamtur01
Copy link
Contributor

jamtur01 commented Aug 30, 2020

Here's a rough initial take:

Metric names

For metrics, Vector broadly follows the Prometheus metric naming standards. Hence, a metric name:

  • Must only contain valid characters, which are ASCII letters and digits, as well as underscores. It should match the regular expression: [a-zA-Z_][a-zA-Z0-9_]*

  • Must specify the purpose of the metrics, for example host_cpu_seconds_total, which specifies the time in seconds that a CPU spends in each mode on a host.

  • Should have a single word prefix that groups metrics from a specific source, for example host-based metrics like CPU, disk, and memory are prefixed with host, Apache metrics are prefixed with apache, etc. Vector calls this a namespace.

  • Should use a single base unit, for example seconds, bytes, metrics.

  • Should end in a suffix that describes the unit in plural form: seconds, bytes. Accumulating counts, both with units and without, should end in total, for example disk_written_bytes_total and http_requests_total.

  • Where required, use tags to differentiate the characteristic of the measurement. For example, whilst host_cpu_seconds_total is name of the metric, we also record the mode that is being used for each CPU. The mode and the specific CPU then become tags on the metric:

host_cpu_seconds_total{cpu="0",mode="idle"}
host_cpu_seconds_total{cpu="0",mode="idle"}
host_cpu_seconds_total{cpu="0",mode="nice"}
host_cpu_seconds_total{cpu="0",mode="system"}
host_cpu_seconds_total{cpu="0",mode="user"}

@binarylogic binarylogic assigned jamtur01 and unassigned jszwedko Aug 31, 2020
@binarylogic
Copy link
Contributor Author

Looking good. A few quick comments:

  1. The "Must specify the purpose of the metrics" probably isn't necessary if we do this next rule...
  2. Let's add a single rule that clarifies the template: <namespace>_<name>_<unit>_[total]. And then expands on each variable as a sub-point.
  3. Decide and align on units. Ex: seconds, bytes, etc. While not a strict requirement, keeping these consistent will produce a much cleaner data catalog for our users.

In general, the shorter and simpler, the more likely people will follow it. I know these are similar to the Prometheus page, but I find that to be verbose as well. Feel free to disagree. I was a math/CS major 😄 .

@jamtur01
Copy link
Contributor

jamtur01 commented Aug 31, 2020

2nd cut:

Metric names

For metrics, Vector broadly follows the Prometheus metric naming standards. Hence, a metric name:

  • Must only contain valid characters, which are ASCII letters and digits, as well as underscores. It should match the regular expression: [a-z_][a-z0-9_]*.

  • Metrics have a broad template:

    <namespace>_<name>_<unit>_[total]

    • The namespace is a single word prefix that groups metrics from a specific source, for example host-based metrics like CPU, disk, and memory are prefixed with host, Apache metrics are prefixed with apache, etc.
    • The name describes what the metric measures.
    • The unit is a single base unit, for example seconds, bytes, metrics.
    • The suffix should describe the unit in plural form: seconds, bytes. Accumulating counts, both with units or without, should end in total, for example disk_written_bytes_total and http_requests_total.
  • Where required, use tags to differentiate the characteristic of the measurement. For example, whilst host_cpu_seconds_total is name of the metric, we also record the mode that is being used for each CPU. The mode and the specific CPU then become tags on the metric:

host_cpu_seconds_total{cpu="0",mode="idle"}
host_cpu_seconds_total{cpu="0",mode="idle"}
host_cpu_seconds_total{cpu="0",mode="nice"}
host_cpu_seconds_total{cpu="0",mode="system"}
host_cpu_seconds_total{cpu="0",mode="user"}
host_cpu_seconds_total

@jszwedko
Copy link
Member

👍 I like it.

I might suggest the following adjustments:

  • only using lowercase, making the pattern [a-z_][a-z0-9_]*; I don't see a reason to allow mixing of cases
  • noting that some metrics will not have units (usually just things that are counted, like http_request_total)
  • <namespace>_<name>_<unit>_[total]; I think we may end up separating out the namespace into a separate field as part of Make the namespace option on metrics sinks optional #3609 in which case the namespace will simply be appended for sinks that format it like this (statsd, prometheus) and used as a separate field for sinks that require (aws_cloudwatch_metrics)

The list at the bottom of https://prometheus.io/docs/practices/naming/#base-units has some additional base units we can adopt (like bytes).

@binarylogic
Copy link
Contributor Author

👍 - lets add this to CONTRIBUTING.md.

jamtur01 added a commit that referenced this issue Aug 31, 2020
Signed-off-by: James Turnbull <james@lovedthanlost.net>
mengesb pushed a commit to jacobbraaten/vector that referenced this issue Dec 9, 2020
…ordotdev#3653)

Signed-off-by: Brian Menges <brian.menges@anaplan.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain: external docs Anything related to Vector's external, public documentation domain: metrics Anything related to Vector's metrics events type: task Generic non-code related tasks
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants