chore: Draft host-metrics RFC #3581

jamtur01 · 2020-08-26T17:17:01Z

Signed-off-by: James Turnbull james@lovedthanlost.net

Signed-off-by: James Turnbull <james@lovedthanlost.net>

jszwedko

🎉 I think this will be a very useful component.

rfcs/2020-08-26-3191-host-metrics.md

jszwedko · 2020-08-26T20:34:21Z

rfcs/2020-08-26-3191-host-metrics.md

+
+Metrics will also be labeled with:
+
+- `host`: the host name of the host being monitored.


I think we'll probably need some more labels for some of the metrics to be useful:

device for disk based metrics

cpu for CPU based metrics

filesystem for filesystem based metrics

Sorry, I'm realizing I was unclear. I think you've addressed this above though with the update to the metrics list. I had meant, for example, that disk based metrics should be labeled (or tagged; not sure which term we prefer) with the device the metric is associated with (e.g. device=/dev/sda).

I think I like these "collector" labels too though. I think they'd be better as simply collector so you'd have things like collector=disk.

Ah. Yeah - this is a terminology mismatch. I'll update to say tagged and make it clearer.

rfcs/2020-08-26-3191-host-metrics.md

jszwedko · 2020-08-26T20:38:38Z

rfcs/2020-08-26-3191-host-metrics.md

+
+## Outstanding Questions
+
+- One source or many? Should we have `host_metrics` or `cpu_metrics`, `mem_metrics`, `disk_metrics`, or `load_metrics`?


I think this is the big one.

I personally like your approach of a single source though. I think we could allow for fine grained control over the metric "families" via the table-based TOML configuration; like:

[sources.my_source_id] type = "host_metrics" disk.devices = ["/dev/sda"] filesystem.mountpoints = ["/home"]

To exclude trying to collect all of them (though you could also do this in a filter transform).

From the point of view of user experience, I think a single source is better. It will end up being a large in terms of source code, but we can manage that with sub-modules internally.

Definitely agree on having a single source

rfcs/2020-08-26-3191-host-metrics.md

Signed-off-by: James Turnbull <james@lovedthanlost.net>

jamtur01 · 2020-08-26T21:09:56Z

@jszwedko Thanks! Updated to reflect your feedback.

Signed-off-by: James Turnbull <james@lovedthanlost.net>

bruceg · 2020-08-26T23:27:47Z

rfcs/2020-08-26-3191-host-metrics.md

+- `filesystem_avail_bytes` labeled with device, filesystem type, and mountpoint (gauge)
+- `filesystem_device_error` labeled with device, filesystem type, and mountpoint (gauge)
+- `filesystem_total_file_nodes` labeled with device, filesystem type, and mountpoint (gauge)
+- `filesystem_free_file_nodes` labeled with device, filesystem type, and mountpoint (gauge)
+- `filesystem_free_bytes` labeled with device, filesystem type, and mountpoint (gauge)
+- `filesystem_size_bytes` labeled with device, filesystem type, and mountpoint (gauge)


What is the distinction between avail and free or size? Also, there seems to be an inconsistency between avail/free/size for bytes, but free/total for file nodes.

I've drawn these from Prometheus' approach specifically.

filesystem_avail_bytes = Filesystem space available to non-root users in bytes.

filesystem_free_bytes = Filesystem free space in bytes.

filesystem_size_bytes = Filesystem size in bytes.

See https://www.robustperception.io/filesystem-metrics-from-the-node-exporter. I'll add an explainer to the RFC.

My comment about the inconsistency was to suggest, for example, we could use avail/free/total for both:

filesystem_avail_bytes filesystem_free_bytes filesystem_total_bytes filesystem_avail_file_nodes filesystem_free_file_nodes filesystem_total_file_nodes

Unless, of course, there is a preference for sticking to Prometheus' terminology.

That's an interesting question - I replicated Prometheus' naming as a lot of folks are familiar with that. Also if folks have dashboards/alerts/etc that calculate things, like free space on a filesystem, then they don't need to rewrite them with new metric names.

🤔 I think the naming you (and prometheus) have feels natural to me. I'm used to referring to filesystems as having a size, probably because of my association with a physical disk and the fact that files themselves are referred to as having "sizes" (and not "total bytes"). Inodes, on the other hand, feels more like a "discrete, countable" thing so total makes sense to me for the cap.

I can see an argument for consistency though.

bruceg · 2020-08-26T23:35:12Z

rfcs/2020-08-26-3191-host-metrics.md

+  collecting = [ "all"] # optional, defaults collecting all metrics.
+  filesystem.mountpoints = [ "all" ] # optional, defaults to all mountpoints.
+  disk.devices = [ "all" ] # optional, defaults to all to disk devices.


I'd recommend against magic words within the list to represent collecting all. We could:

not allow wildcard configuration like that, just say the default is all metrics;

use an actual wildcard to represent all, like "*", which may be useful to allow for glob like "filesystem_read*"; or

use a custom serializer to allow either "ALL" outside of the list for all, or a list to enumerite items (ie collecting = "ALL" vs collecting = [ "a", "b" ])

I like the globbing. +1.

bruceg · 2020-08-26T23:39:04Z

rfcs/2020-08-26-3191-host-metrics.md

+
+## Outstanding Questions
+
+- One source or many? Should we have `host_metrics` or `cpu_metrics`, `mem_metrics`, `disk_metrics`, or `load_metrics`?


From the point of view of user experience, I think a single source is better. It will end up being a large in terms of source code, but we can manage that with sub-modules internally.

Signed-off-by: James Turnbull <james@lovedthanlost.net>

rfcs/2020-08-26-3191-host-metrics.md

Signed-off-by: James Turnbull <james@lovedthanlost.net>

lukesteensen · 2020-08-27T20:10:10Z

rfcs/2020-08-26-3191-host-metrics.md

+
+## Outstanding Questions
+
+- One source or many? Should we have `host_metrics` or `cpu_metrics`, `mem_metrics`, `disk_metrics`, or `load_metrics`?


Definitely agree on having a single source

lukesteensen · 2020-08-27T20:10:47Z

rfcs/2020-08-26-3191-host-metrics.md

+
+I've found a number of possible Rust-based solutions for implementing this collection, cross-platform.
+
+- https://crates.io/crates/heim (Rust)


heim looks very promising and would likely be a good starting point.

binarylogic · 2020-08-27T23:49:22Z

rfcs/2020-08-26-3191-host-metrics.md

+
+## Internal Proposal
+
+Build a single source called `host_metrics` (name to be confirmed) to collect host/system level metrics.


I'm 👍 on host_metrics unless others disagree. An alternative could be node_metrics, but that seems less precise to me. I'm curious why Prometheus adopted this nomenclature.

rfcs/2020-08-26-3191-host-metrics.md

binarylogic

Nice work on this. Only one comment on the collector name, otherwise looks great.

Signed-off-by: James Turnbull <james@lovedthanlost.net>

jszwedko · 2020-08-28T19:23:17Z

rfcs/2020-08-26-3191-host-metrics.md

+collectors = [ "cpu", "memory", "network" ]
+```
+
+For disk and network devices or filesystem mountpoints the default is to collect for all ("*") devices and mountpoints. Or you can configure Vector to only collect from specific devices, for example:


I'm wondering if will be useful to also allow users to specify a denylist rather than allowlist, in the future. Maybe we could model this like network.devices = ["*", "!eth0"] to monitor everything but eth0? We could expand this pattern generally to this type of "filter" config.

jszwedko

I just had one last thought around letting people exclude resources rather than an allowlist (#3581 (comment)).

I think that could be tackled separately though, if we want. This looks good to me.

Signed-off-by: James Turnbull <james@lovedthanlost.net>

Signed-off-by: Brian Menges <brian.menges@anaplan.com>

jamtur01 added 2 commits August 26, 2020 13:14

Draft host-metrics RFC

a7d0676

Signed-off-by: James Turnbull <james@lovedthanlost.net>

More edits

fdf155f

Signed-off-by: James Turnbull <james@lovedthanlost.net>

jamtur01 added this to the 2020.08.17 - On The Road Again milestone Aug 26, 2020

jamtur01 self-assigned this Aug 26, 2020

jamtur01 requested review from binarylogic, bruceg, JeanMertz and lukesteensen August 26, 2020 18:29

jamtur01 added domain: metrics Anything related to Vector's metrics events needs: approval Needs review & approval before work can begin. labels Aug 26, 2020

jamtur01 marked this pull request as ready for review August 26, 2020 18:30

jamtur01 requested a review from jszwedko August 26, 2020 18:30

Fixed typo

6ee375a

Signed-off-by: James Turnbull <james@lovedthanlost.net>

jamtur01 mentioned this pull request Aug 26, 2020

Add a host metrics source #3587

Closed

jszwedko reviewed Aug 26, 2020

View reviewed changes

Edit from Jesse

857bab3

Signed-off-by: James Turnbull <james@lovedthanlost.net>

jamtur01 added 5 commits August 26, 2020 17:10

fixed label

0ff8da9

Signed-off-by: James Turnbull <james@lovedthanlost.net>

Fixed typo

7736915

Signed-off-by: James Turnbull <james@lovedthanlost.net>

Fixed urls

f6009af

Signed-off-by: James Turnbull <james@lovedthanlost.net>

Fixed list

7468c21

Signed-off-by: James Turnbull <james@lovedthanlost.net>

Added network metrics

0a7a4c3

Signed-off-by: James Turnbull <james@lovedthanlost.net>

bruceg reviewed Aug 26, 2020

View reviewed changes

Addressed feedback from Bruce

e48aff8

Signed-off-by: James Turnbull <james@lovedthanlost.net>

binarylogic reviewed Aug 27, 2020

View reviewed changes

rfcs/2020-08-26-3191-host-metrics.md Outdated Show resolved Hide resolved

binarylogic mentioned this pull request Aug 27, 2020

chore(rfcs): Add RFC for Apache HTTP Server metrics source #3519

Merged

jamtur01 added 2 commits August 27, 2020 10:48

Updated label to tag

f2fdef6

Signed-off-by: James Turnbull <james@lovedthanlost.net>

Replaced label with tag

7e8fab7

Signed-off-by: James Turnbull <james@lovedthanlost.net>

lukesteensen approved these changes Aug 27, 2020

View reviewed changes

binarylogic reviewed Aug 27, 2020

View reviewed changes

rfcs/2020-08-26-3191-host-metrics.md Outdated Show resolved Hide resolved

binarylogic reviewed Aug 27, 2020

View reviewed changes

rfcs/2020-08-26-3191-host-metrics.md Outdated Show resolved Hide resolved

binarylogic self-requested a review August 28, 2020 01:05

binarylogic approved these changes Aug 28, 2020

View reviewed changes

Fb from Ben

d43d9fc

Signed-off-by: James Turnbull <james@lovedthanlost.net>

jszwedko reviewed Aug 28, 2020

View reviewed changes

jszwedko approved these changes Aug 28, 2020

View reviewed changes

Added comment about exclusion

abca508

Signed-off-by: James Turnbull <james@lovedthanlost.net>

jamtur01 merged commit 3cbf4af into master Aug 29, 2020

jamtur01 deleted the host_metrics_rfc branch August 29, 2020 02:30

binarylogic mentioned this pull request Aug 30, 2020

New host_metrics RFC #3191

Closed

3 tasks

jamtur01 mentioned this pull request Aug 31, 2020

chore: Added PostgreSQL RFC #3612

Merged

mengesb pushed a commit to jacobbraaten/vector that referenced this pull request Dec 9, 2020

chore: Draft host-metrics RFC (vectordotdev#3581)

6fc826f

Signed-off-by: Brian Menges <brian.menges@anaplan.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore: Draft host-metrics RFC #3581

chore: Draft host-metrics RFC #3581

jamtur01 commented Aug 26, 2020

jszwedko left a comment

jszwedko Aug 26, 2020

jszwedko Aug 27, 2020

jamtur01 Aug 27, 2020

jszwedko Aug 26, 2020

bruceg Aug 26, 2020

lukesteensen Aug 27, 2020

jamtur01 commented Aug 26, 2020

bruceg Aug 26, 2020

jamtur01 Aug 26, 2020

bruceg Aug 27, 2020

jamtur01 Aug 27, 2020

jszwedko Aug 27, 2020 •

edited

Loading

bruceg Aug 26, 2020

jamtur01 Aug 26, 2020

bruceg Aug 26, 2020

lukesteensen Aug 27, 2020

lukesteensen Aug 27, 2020

binarylogic Aug 27, 2020

binarylogic left a comment

jszwedko Aug 28, 2020 •

edited

Loading

jszwedko left a comment


		Metrics will also be labeled with:

		- `host`: the host name of the host being monitored.


		## Outstanding Questions

		- One source or many? Should we have `host_metrics` or `cpu_metrics`, `mem_metrics`, `disk_metrics`, or `load_metrics`?


		I've found a number of possible Rust-based solutions for implementing this collection, cross-platform.

		- https://crates.io/crates/heim (Rust)


		## Internal Proposal

		Build a single source called `host_metrics` (name to be confirmed) to collect host/system level metrics.

chore: Draft host-metrics RFC #3581

chore: Draft host-metrics RFC #3581

Conversation

jamtur01 commented Aug 26, 2020

jszwedko left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jamtur01 commented Aug 26, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jszwedko Aug 27, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

binarylogic left a comment

Choose a reason for hiding this comment

jszwedko Aug 28, 2020 • edited Loading

Choose a reason for hiding this comment

jszwedko left a comment

Choose a reason for hiding this comment

jszwedko Aug 27, 2020 •

edited

Loading

jszwedko Aug 28, 2020 •

edited

Loading