Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation fault(coredump) when using -c or -C options #26

Open
dks0296586 opened this issue Nov 1, 2022 · 7 comments
Open

Segmentation fault(coredump) when using -c or -C options #26

dks0296586 opened this issue Nov 1, 2022 · 7 comments

Comments

@dks0296586
Copy link

dks0296586 commented Nov 1, 2022

We have deployed the exporter to approximately 200 AIX servers of various versions and TL levels with no issues.

There are 10 servers, all atleast running AIX 7.1 that are having issues.
When we set either -C or -c and Prometheus initiates the scrape, we get a segmentation fault. This happens on all versions of the exporter that we have tested it on (1.14.3.0, 1.12.1.0, 1.8.0.0, maybe others)

./node_exporter_aix -p 50005 -a -cmdif
Node exporter for AIX version 1.14.3.0 listening on port 50005
Segmentation fault(coredump)

We tested the debug version that was posted in another segmentation fault issue, and got a little extra info:

./node_exporter_aix_debug -p 50005 -a -cmdif
Node exporter for AIX version 1.12.1.0 listening on port 50005
Number of cpu records: 160
Segmentation fault(coredump)

We found that 9 of the 10 servers have 8 SMT threads with over 128 virtual CPU’s allocated.
All the other servers that are working have less than 64 virtual cpu’s.

Is there a limit on number of CPUs that we could be hitting to cause the segmentation faults?

@mattdurham
Copy link

Can you give https://github.com/grafana/node_exporter_aix/releases/tag/v1.15.6 a whirl? Testing it with some of our users and it solved segmentation fault, would love to see if it also solves your issues. Once its baked in a bit going to submit PR to upstream the changes.

@dks0296586
Copy link
Author

We were able to confirm that 120 logical cpu's is fine, but adding 1 more(smt8) to 128 logical cpu's causes the segmentation fault

Can you give https://github.com/grafana/node_exporter_aix/releases/tag/v1.15.6 a whirl?

We will give this a try today!

@dks0296586
Copy link
Author

Can you give https://github.com/grafana/node_exporter_aix/releases/tag/v1.15.6 a whirl? Testing it with some of our users and it solved segmentation fault, would love to see if it also solves your issues. Once its baked in a bit going to submit PR to upstream the changes.

This version seems to be working initially with only "-c" on 128+(tested up to 168) logical cpus. Definitly an improvement.
The "-C" is still causing the same segmentation fault errors

@mattdurham
Copy link

mattdurham commented Nov 4, 2022

https://github.com/grafana/node_exporter_aix/releases/tag/v1.15.7 <- give this a whirl. The -C goes through a different path than other collects so had to change that one too.

@dks0296586
Copy link
Author

https://github.com/grafana/node_exporter_aix/releases/tag/v1.15.7 <- give this a whirl. The -C goes through a different path than other collects so had to change that one too.

That seems to be running with no segmentation faults!

During the issues with this, we noticed that our CPU usage % doesn't seem to be coming out right on this or the older versions. Have you noticed this?
This probably doesn't belong in this thread, I can start a new one to discuss.

@mattdurham
Copy link

I haven't but its not something I have looked into. If you want to start a new discussion and tag me with the exact details, I can take a look.

@lbsivahari
Copy link

Please refer pull request #33 #34 #35, Whether that is fixing your issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants