Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metrics breakdown per dashboard #11

Closed
mforgues opened this issue Dec 23, 2020 · 6 comments · Fixed by #12
Closed

Metrics breakdown per dashboard #11

mforgues opened this issue Dec 23, 2020 · 6 comments · Fixed by #12
Labels
bug Something isn't working

Comments

@mforgues
Copy link

Hi,

I noticed something while trying out the frigga metric tool.

Seems like the breakdown of the metrics per dashboard is off a little.

Like the detected metrics of the first dashboard are also contained in the subsequent one and etc.

Overall metric list is good but also try to see in the python code to find the issue but I am a beginner...

Example of what I mean :

"dashboards": {
"baremetal_detailed_v1": {
"gnet_id": "null",
"metrics": [
"idmgroup",
"instance",
"label_values",
"power_supply",
"processor",
"redfish_chassis_fan_health",
"redfish_chassis_fan_rpm_percentage",
"redfish_chassis_fan_state",
"redfish_chassis_network_adapter_health_state",
"redfish_chassis_network_adapter_state",
"redfish_chassis_network_port_health_state",
"redfish_chassis_network_port_state",
"redfish_chassis_power_average_consumed_watts",
"redfish_chassis_power_powersupply_health",
"redfish_chassis_power_powersupply_power_capacity_watts",
"redfish_chassis_power_powersupply_state",
"redfish_chassis_power_voltage_volts",
"redfish_chassis_temperature_celsius",
"redfish_chassis_temperature_sensor_state",
"redfish_system_health_state",
"redfish_system_memory_capacity",
"redfish_system_memory_health_state",
"redfish_system_memory_state",
"redfish_system_network_interface_health_state",
"redfish_system_network_interface_state",
"redfish_system_processor_health_state",
"redfish_system_processor_state",
"redfish_system_processor_total_cores",
"redfish_system_processor_total_threads",
"redfish_system_storage_drive_capacity",
"redfish_system_storage_drive_state",
"redfish_system_storage_volume_capacity",
"redfish_system_storage_volume_state",
"sensor",
"volume"
],
"num_metrics": 35
},
"cadvisor": {
"gnet_id": "null",
"metrics": [
"cadvisor_version_info",
"container_cpu_usage_seconds_total",
"container_last_seen",
"container_memory_max_usage_bytes",
"container_memory_rss",
"container_memory_usage_bytes",
"container_network_receive_bytes_total",
"container_network_transmit_bytes_total",
"container_spec_memory_limit_bytes",
"idmgroup",
"instance",
"label_values",
"power_supply",
"processor",
"redfish_chassis_fan_health",
"redfish_chassis_fan_rpm_percentage",
"redfish_chassis_fan_state",
"redfish_chassis_network_adapter_health_state",
"redfish_chassis_network_adapter_state",
"redfish_chassis_network_port_health_state",
"redfish_chassis_network_port_state",
"redfish_chassis_power_average_consumed_watts",
"redfish_chassis_power_powersupply_health",
"redfish_chassis_power_powersupply_power_capacity_watts",
"redfish_chassis_power_powersupply_state",
"redfish_chassis_power_voltage_volts",
"redfish_chassis_temperature_celsius",
"redfish_chassis_temperature_sensor_state",
"redfish_system_health_state",
"redfish_system_memory_capacity",
"redfish_system_memory_health_state",
"redfish_system_memory_state",
"redfish_system_network_interface_health_state",
"redfish_system_network_interface_state",
"redfish_system_processor_health_state",
"redfish_system_processor_state",
"redfish_system_processor_total_cores",
"redfish_system_processor_total_threads",
"redfish_system_storage_drive_capacity",
"redfish_system_storage_drive_state",
"redfish_system_storage_volume_capacity",
"redfish_system_storage_volume_state",
"sensor",
"volume"
],
"num_metrics": 44
},

Mathieu

@unfor19
Copy link
Owner

unfor19 commented Dec 23, 2020

Hi @mforgues , thank you for your input!
Are you 100% sure that you don't have any panel/row in the dashboard that is common to "baremetal_v1" and "cadvisor" dashboards?

Could you share the json files of your dashboards? If not, please search for "redfish" in the "cadvisor" dashboard, you might find a few surprises over there

@mforgues
Copy link
Author

Hi,

Sure I can share the dashboards.

For the cadvisor dashboard it's actually the one that is provisioned automatically when using the command :

bash docker-compose/deploy_stack.sh

I am almost certain there's no redfish stuff in the cadvisor dashboard since redfish is a protocol used to get metrics from HW idracs for example.

Thanks for making this tool, it's really useful for what I am trying to put in place.

Mathieu

dashboards.zip

@unfor19
Copy link
Owner

unfor19 commented Dec 23, 2020

@mforgues Thanks for sharing the dashboards, I definitely want to investigate this issue. According to your metrics.json it is clear that there's an issue.

I'll go over the Python code and see what's causing this.

And thank you for the positive feedback, much appreciated!

@mforgues
Copy link
Author

Thanks.

I also ran it again just to be sure you have a proper .metrics.json file.

I added my baremetal dashboard to the 4 already provisioned ones and got this :

frigga gl

Grafana url [http://localhost:3000]: http://172.18.0.5:3000
Grafana api key:

[LOG] Getting the list of words to ignore when scraping from Grafana
[LOG] Successfully got words from https://prometheus.io/docs/prometheus/latest/querying/functions/
[LOG] Successfully got words from https://prometheus.io/docs/prometheus/latest/querying/operators/
[LOG] Found 67 words to ignore in expressions
[LOG] Successful response from http://172.18.0.5:3000/api/search?query=
[LOG] Successful response from http://172.18.0.5:3000/api/dashboards/uid/redfish_v1
[LOG] Getting metrics from baremetal_detailed_v1
[LOG] Found 35 metrics
[LOG] Successful response from http://172.18.0.5:3000/api/dashboards/uid/Ss3q6hSZk
[LOG] Getting metrics from cadvisor
[LOG] Found 44 metrics
[LOG] Successful response from http://172.18.0.5:3000/api/dashboards/uid/U9Se3uZMz
[LOG] Getting metrics from jobs-usage
[LOG] Found 44 metrics
[LOG] Successful response from http://172.18.0.5:3000/api/dashboards/uid/rYdddlPWk
[LOG] Getting metrics from node-exporter-full
[LOG] Found 240 metrics
[LOG] Successful response from http://172.18.0.5:3000/api/dashboards/uid/NNrbK9ZGz
[LOG] Getting metrics from prometheus-2-0-overview
[LOG] Found 302 metrics
[LOG] Found a total of 302 unique metrics to keep

Attached is my .metrics.json

I did try to look and play with the python code but need more learning I guess to be able to find the issue :)

As well overall metric list is good, it's just the break down per dashboard that is like a cumulative.

Works fine of course if you run it with only one dashboard.

BR,

Mathieu

metrics.zip

@unfor19 unfor19 mentioned this issue Jan 12, 2021
@unfor19 unfor19 added the bug Something isn't working label Jan 12, 2021
@unfor19
Copy link
Owner

unfor19 commented Jan 12, 2021

@mforgues I figured it out, see PR #12 for more details
Update frigga to v1.0.8 to apply the fix

pip install frigga==1.0.8

And again, thank you for your input

@mforgues
Copy link
Author

Thanks @unfor19, very appreciated.

Will give it a try and let you know how it goes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants