Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exporter fails due to "AllocGRES is deprecated" fatal error #40

Open
tgooding opened this issue Feb 17, 2021 · 4 comments
Open

Exporter fails due to "AllocGRES is deprecated" fatal error #40

tgooding opened this issue Feb 17, 2021 · 4 comments
Assignees

Comments

@tgooding
Copy link

The latest prometheus-slurm-exporter runs for a few seconds before terminating with a fatal error:

prometheus-slurm-exporter/bin/prometheus-slurm-exporter   
INFO[0000] Starting Server: :8080                        source="main.go:48"
FATA[0004] exit status 1                                 source="gpus.go:101"

I'm running slurm-20.11.3-1 and a rebuild picked up the new gpus.go module. Digging into it a bit, it appears the Allocgres option to sacct is treated as fatal, which causes the Execute() routine to terminate:

sh-4.4$ sacct -a -X --format=Allocgres --state=RUNNING --noheader --parsable2
sacct: fatal: AllocGRES is deprecated, please use AllocTRES
@mtds mtds self-assigned this Feb 23, 2021
@mtds
Copy link
Collaborator

mtds commented Feb 23, 2021

Interesting. At the moment I do not have a chance to test it, since we are still running with version 18.08.8. I can try to reproduce it on VMs and eventually look for a workaround but I guess the only solution (not particularly nice) would be with an if/else based on the output of sacct.

@biocyberman
Copy link

I faced the same problem because I am running SLURM 20.11.3. The problem is at this line: https://github.com/vpenso/prometheus-slurm-exporter/blob/master/gpus.go#L41
I changed it to
args := []string{"-a", "-X", "--format=AllocTRES", "--state=RUNNING", "--noheader", "--parsable2"}
Rebuild, reinstall ant it works.

@mtds mtds mentioned this issue Mar 4, 2021
@mtds
Copy link
Collaborator

mtds commented Mar 4, 2021

This can be a thorny issue: we rely on the output provided by the Slurm command lines utilities. Whenever the developers from SchedMD change the format or drop some options, this exporter is not able to cope with.

As it seem also from issue #38 , at the moment it can be guaranteed that this exporter will run on Slurm version 18.x.x. On higher version there may be problems.

@lahwaacz
Copy link
Contributor

I changed it to
args := []string{"-a", "-X", "--format=AllocTRES", "--state=RUNNING", "--noheader", "--parsable2"}
Rebuild, reinstall ant it works.

It's not that simple, the format of the output changed as well. More info is already in #38.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants