-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(grafana): add initial grafana dashboard #14369
base: master
Are you sure you want to change the base?
feat(grafana): add initial grafana dashboard #14369
Conversation
- add initial draft for Grafana full dashboard Tested: - Local tests
✅ Deploy Preview for vector-project ready!
To edit notification comments on pull requests, go to your Netlify site settings. |
Soak Test ResultsBaseline: 0083d15 ExplanationA soak test is an integrated performance test for vector in a repeatable rig, with varying configuration for vector. What follows is a statistical summary of a brief vector run for each configuration across SHAs given above. The goal of these tests are to determine, quickly, if vector performance is changed and to what degree by a pull request. Where appropriate units are scaled per-core. The table below, if present, lists those experiments that have experienced a statistically significant change in their throughput performance between baseline and comparision SHAs, with 90.0% confidence OR have been detected as newly erratic. Negative values mean that baseline is faster, positive comparison. Results that do not exhibit more than a ±8.87% change in mean throughput are discarded. An experiment is erratic if its coefficient of variation is greater than 0.3. The abbreviated table will be omitted if no interesting changes are observed. No interesting changes in throughput with confidence ≥ 90.00% and absolute Δ mean >= ±8.87%: Fine details of change detection per experiment.
|
@spencergilbert could you please take a look on it? |
I'll definitely try and find time to review this week, I imagine it'll be somewhat of a pain reviewing without having access to the ui - so we probably want to consider how to support that locally 🤔 cc @binarylogic who's recently done some dashboard work and probably has some opinions. |
Thinking about it, I'm also not sure if we want to be the owners of the dashboard as (to my knowledge) none of us are using Grafana. It may be better to have a |
In it's current state, this dashboard is useless. There are no Vector metrics. All I see are SQS, Kafka, and Kubernetes metrics. And even those don't work. |
They are located in General panel. And General tab is tested locally with actual metrics from Vector. However, not all Vector added right now, since at first I wanna get a feedback.
Do you have a corresponding metrics in your Prometheus setup? I have no, I have just inserted them into the dashboard as an example of grouping different metrics groups on the dashboard. If this way is good for Vector devs, I will continue work on it. |
@spencergilbert since you have no desire to maintain it in the main repo, could you please internally in your team, how should we deal with the community-supported tooling around vector? From my perspective, additional When you discuss it, please write a note about Vector dev team decision here and we will move on with the dashboard. Thank you. |
Before your decision, I have continued work here: https://github.com/zamazan4ik/vector-community Current state - I have added initial panels versions for all metrics from |
@spencergilbert Did you discuss this PR somewhere internally? Are you still interested in this dashboard in the upstream? |
Discussing with @jszwedko tomorrow, we'd prefer not to start the repo/org without having some plans in place for how it would be managed/admin/etc. - hopefully can get a rough plan in place soon. |
Update regarding the current state of the dashboard. I have just uploaded a big bunch of changes. Now dashboard has all Vector metrics inside it, organized in some meaningful groups. All counter-based sample dashboards were transformed into rate-based. Added for filtering support based on dashboard variables. The biggest issue that I cannot test all metrics locally because right now Vector has no ability to generate fake metrics so I just tested a subset of them and they work. If anyone is interested in the dashboard - you are welcome to test it. Just download the dashboard, import into your Grafana, select data source and it should work. |
I discussed this briefly with @jszwedko and @spencergilbert . The preference is to eventually have it live in something like In the short term, keeping it in your repo and re-directing users there is good 👍 |
(FYI) We created a task internally to track the work to prep for a |
@zamazan4ik @neuronull any progress on this? |
No progress from my side since this PR is not going to be merged by Vector devs :) |
Hi!
In this PR I want to provide initial Grafana dashboard for Vector. It resolves #4838 .
Below I will explain several choices which I have done during the working on the dashboard. I want to gather a feedback from the maintainers and other people before futher steps with the dashboard. Current state of this dashboard - early draft.
Implementation details
Inspiration
As an example, I have used Node Exporter dashboard: https://grafana.com/grafana/dashboards/1860-node-exporter-full/ (as was mentioned here: #4838 (comment)).
Which metrics should be tracked on the dashboard?
I think as a base dashboard we need to provide full dashboard for as much metrics as we can. Users can choose, in which panels they are interested in their particular cases and just remove/disable uneeded panels on their own. Also, if we can identify common use cases from Vector users, we also should put them into the dashboard. E.g. see this comment.
Which Grafana/Prometheus versions should be supported?
I am not a Grafana/Prometheus guru, so I do not know much about differences between versions. I suggest at least at first version target the latest available Grafana/Prometheus features. Later, if we see a desire from Vector users in backporting to the older versions, we could think about it.
Rows on the dashboard
I have grouped related panels into groups (as an example right now there are Kafka, Kubernetes and SQS groups. General group is just for metric without an explicit group). It should users with navigating over the dozen of metrics in some more guided way.
Filtering capabilities
That is an interesting topic to discuss. You can see the following variables, added to the dashboard:
DS_PROMETHEUS
- that is a hack, which I have found in Node Exporter dashboard. It allows to select different data sources for the dashboard. Seems to be kinda useful, I guesscomponent_*(id, type, kind)
- since almost all metrics have this labels, I think it would be useful to be able to filter dashboards via these labelshost
,instance
- the same reasoning as forcomponent_*
group - some metrics have these lables it will be convenient to filter over themjob
- this is an additional labels, which is added with Prometheus (actually, VictoriaMetrics, since I use it locally) during the configuration. It simply adds the label to all metrics from the data source. Maybe it will be a good idea to eliminateDS_PROMETHEUS
variable and just leftjob
?But there is at least one more point for additional configuration - metric prefix. Vector's
internal_metrics
source adds a namespace to all metrics. In real life, user possibly wants to gather metrics with different prefixes in one dashboard and have an ability to filter based on this prefix. That is why for fetching metrics I use{ __name__ =~ ".*_metric_name" }
notation - looks ugly, but it allows to fetch all prefixes. However, now there is no way to choose, in which exactly prefix user is interested - I do not know, how to implement it properly. Any ideas? :)UPD: I have talked with colleagues and they suggested try to use Prometheus relabeling capabilities. With relabeling we will be able to add specific labels based on the metric prefix. And then based on a label add a corresponding variable to the dashboard. Sounds good but in this case will be a requirement to the users setup a relabeling somehow on their own.
Alerting
No alerting is implemented yet. I think it is out of scope of this PR. However, some sample alerts could be added later in other PRs.
Additional details
After the merging the dashboard, I suggest to publish the resulting dashboard to the Grafana site. Also, since the dashboard definitely will be evolving in the future, we need to establish the process with regular publishing the dashboard to Grafana site (some kind of CI). Since the ownership over the dashboard will be on Vector team, I guess you need somehow implement it :)
Also, when Vector will introduce new metrics or change existing one, I think we should put checking the Grafana dashboard to a some sort of checklist. Since Grafana dashboard should be up to date.
And don't forget somehow mention this dashboard in Vector documentation :)
If you have any sort of ideas/feedback/whatever else - please share it here!
Thanks in advance!