Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some low-level errors (like pynvml.nvml.NVMLError_LibRmVersionMismatch) result in nothing printed (std or diagnostic) #147

Open
munael opened this issue Dec 20, 2022 · 1 comment

Comments

@munael
Copy link

munael commented Dec 20, 2022

Describe the bug

Something caused a version mismatch somewhere and I can no longer use gpustat. Nothing at all is printed on stdout or stderr. Running with --debug prints nothing as well. I launched it as python -m pdb -m gpustat and stepped through until noticing an error raised in:

/opt/conda/lib/python3.8/site-packages/pynvml/nvml.py(718)

of type pynvml.nvml.NVMLError_LibRmVersionMismatch.

Screenshots or Program Output

Please provide the output of gpustat --debug and nvidia-smi. Or attach screenshots if applicable.

Environment information:

  • OS: Ubuntu 20.04
  • NVIDIA Driver version: 510.73.08
  • The name(s) of GPU card: Tesla V100-SXM2
  • gpustat version: 1.0.0
  • pynvml version: 11.495.46

Additional context

Add any other context about the problem here.

@munael munael added the bug label Dec 20, 2022
@wookayin
Copy link
Owner

wookayin commented Dec 24, 2022

Can you please provide a full stacktrace from gpustat --debug (or with pdb)? On your side nothing is printed, right? I'd like to know which nvml... call throws the error.

In pdb you can do (Pdb) bt to obtain the full stacktrace in a post-mortem mode.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants