Metrics breakdown per dashboard #11

mforgues · 2020-12-23T15:16:29Z

Hi,

I noticed something while trying out the frigga metric tool.

Seems like the breakdown of the metrics per dashboard is off a little.

Like the detected metrics of the first dashboard are also contained in the subsequent one and etc.

Overall metric list is good but also try to see in the python code to find the issue but I am a beginner...

Example of what I mean :

"dashboards": {
"baremetal_detailed_v1": {
"gnet_id": "null",
"metrics": [
"idmgroup",
"instance",
"label_values",
"power_supply",
"processor",
"redfish_chassis_fan_health",
"redfish_chassis_fan_rpm_percentage",
"redfish_chassis_fan_state",
"redfish_chassis_network_adapter_health_state",
"redfish_chassis_network_adapter_state",
"redfish_chassis_network_port_health_state",
"redfish_chassis_network_port_state",
"redfish_chassis_power_average_consumed_watts",
"redfish_chassis_power_powersupply_health",
"redfish_chassis_power_powersupply_power_capacity_watts",
"redfish_chassis_power_powersupply_state",
"redfish_chassis_power_voltage_volts",
"redfish_chassis_temperature_celsius",
"redfish_chassis_temperature_sensor_state",
"redfish_system_health_state",
"redfish_system_memory_capacity",
"redfish_system_memory_health_state",
"redfish_system_memory_state",
"redfish_system_network_interface_health_state",
"redfish_system_network_interface_state",
"redfish_system_processor_health_state",
"redfish_system_processor_state",
"redfish_system_processor_total_cores",
"redfish_system_processor_total_threads",
"redfish_system_storage_drive_capacity",
"redfish_system_storage_drive_state",
"redfish_system_storage_volume_capacity",
"redfish_system_storage_volume_state",
"sensor",
"volume"
],
"num_metrics": 35
},
"cadvisor": {
"gnet_id": "null",
"metrics": [
"cadvisor_version_info",
"container_cpu_usage_seconds_total",
"container_last_seen",
"container_memory_max_usage_bytes",
"container_memory_rss",
"container_memory_usage_bytes",
"container_network_receive_bytes_total",
"container_network_transmit_bytes_total",
"container_spec_memory_limit_bytes",
"idmgroup",
"instance",
"label_values",
"power_supply",
"processor",
"redfish_chassis_fan_health",
"redfish_chassis_fan_rpm_percentage",
"redfish_chassis_fan_state",
"redfish_chassis_network_adapter_health_state",
"redfish_chassis_network_adapter_state",
"redfish_chassis_network_port_health_state",
"redfish_chassis_network_port_state",
"redfish_chassis_power_average_consumed_watts",
"redfish_chassis_power_powersupply_health",
"redfish_chassis_power_powersupply_power_capacity_watts",
"redfish_chassis_power_powersupply_state",
"redfish_chassis_power_voltage_volts",
"redfish_chassis_temperature_celsius",
"redfish_chassis_temperature_sensor_state",
"redfish_system_health_state",
"redfish_system_memory_capacity",
"redfish_system_memory_health_state",
"redfish_system_memory_state",
"redfish_system_network_interface_health_state",
"redfish_system_network_interface_state",
"redfish_system_processor_health_state",
"redfish_system_processor_state",
"redfish_system_processor_total_cores",
"redfish_system_processor_total_threads",
"redfish_system_storage_drive_capacity",
"redfish_system_storage_drive_state",
"redfish_system_storage_volume_capacity",
"redfish_system_storage_volume_state",
"sensor",
"volume"
],
"num_metrics": 44
},

Mathieu

unfor19 · 2020-12-23T17:16:18Z

Hi @mforgues , thank you for your input!
Are you 100% sure that you don't have any panel/row in the dashboard that is common to "baremetal_v1" and "cadvisor" dashboards?

Could you share the json files of your dashboards? If not, please search for "redfish" in the "cadvisor" dashboard, you might find a few surprises over there

mforgues · 2020-12-23T20:31:28Z

Hi,

Sure I can share the dashboards.

For the cadvisor dashboard it's actually the one that is provisioned automatically when using the command :

bash docker-compose/deploy_stack.sh

I am almost certain there's no redfish stuff in the cadvisor dashboard since redfish is a protocol used to get metrics from HW idracs for example.

Thanks for making this tool, it's really useful for what I am trying to put in place.

Mathieu

dashboards.zip

unfor19 · 2020-12-23T20:36:00Z

@mforgues Thanks for sharing the dashboards, I definitely want to investigate this issue. According to your metrics.json it is clear that there's an issue.

I'll go over the Python code and see what's causing this.

And thank you for the positive feedback, much appreciated!

mforgues · 2020-12-23T20:55:24Z

Thanks.

I also ran it again just to be sure you have a proper .metrics.json file.

I added my baremetal dashboard to the 4 already provisioned ones and got this :

frigga gl

Grafana url [http://localhost:3000]: http://172.18.0.5:3000
Grafana api key:

[LOG] Getting the list of words to ignore when scraping from Grafana
[LOG] Successfully got words from https://prometheus.io/docs/prometheus/latest/querying/functions/
[LOG] Successfully got words from https://prometheus.io/docs/prometheus/latest/querying/operators/
[LOG] Found 67 words to ignore in expressions
[LOG] Successful response from http://172.18.0.5:3000/api/search?query=
[LOG] Successful response from http://172.18.0.5:3000/api/dashboards/uid/redfish_v1
[LOG] Getting metrics from baremetal_detailed_v1
[LOG] Found 35 metrics
[LOG] Successful response from http://172.18.0.5:3000/api/dashboards/uid/Ss3q6hSZk
[LOG] Getting metrics from cadvisor
[LOG] Found 44 metrics
[LOG] Successful response from http://172.18.0.5:3000/api/dashboards/uid/U9Se3uZMz
[LOG] Getting metrics from jobs-usage
[LOG] Found 44 metrics
[LOG] Successful response from http://172.18.0.5:3000/api/dashboards/uid/rYdddlPWk
[LOG] Getting metrics from node-exporter-full
[LOG] Found 240 metrics
[LOG] Successful response from http://172.18.0.5:3000/api/dashboards/uid/NNrbK9ZGz
[LOG] Getting metrics from prometheus-2-0-overview
[LOG] Found 302 metrics
[LOG] Found a total of 302 unique metrics to keep

Attached is my .metrics.json

I did try to look and play with the python code but need more learning I guess to be able to find the issue :)

As well overall metric list is good, it's just the break down per dashboard that is like a cumulative.

Works fine of course if you run it with only one dashboard.

BR,

Mathieu

metrics.zip

unfor19 · 2021-01-12T23:43:29Z

@mforgues I figured it out, see PR #12 for more details
Update frigga to v1.0.8 to apply the fix

pip install frigga==1.0.8

And again, thank you for your input

mforgues · 2021-01-13T15:33:22Z

Thanks @unfor19, very appreciated.

Will give it a try and let you know how it goes.

unfor19 mentioned this issue Jan 12, 2021

Fix/11 #12

Merged

unfor19 closed this as completed in #12 Jan 12, 2021

unfor19 added the bug Something isn't working label Jan 12, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Metrics breakdown per dashboard #11

Metrics breakdown per dashboard #11

mforgues commented Dec 23, 2020

unfor19 commented Dec 23, 2020 •

edited

Loading

mforgues commented Dec 23, 2020

unfor19 commented Dec 23, 2020

mforgues commented Dec 23, 2020

unfor19 commented Jan 12, 2021 •

edited

Loading

mforgues commented Jan 13, 2021

Metrics breakdown per dashboard #11

Metrics breakdown per dashboard #11

Comments

mforgues commented Dec 23, 2020

unfor19 commented Dec 23, 2020 • edited Loading

mforgues commented Dec 23, 2020

unfor19 commented Dec 23, 2020

mforgues commented Dec 23, 2020

frigga gl

unfor19 commented Jan 12, 2021 • edited Loading

mforgues commented Jan 13, 2021

unfor19 commented Dec 23, 2020 •

edited

Loading

unfor19 commented Jan 12, 2021 •

edited

Loading