Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Process not displayed #157

Closed
ryanxingql opened this issue Jun 30, 2023 · 3 comments
Closed

Process not displayed #157

ryanxingql opened this issue Jun 30, 2023 · 3 comments

Comments

@ryanxingql
Copy link

ryanxingql commented Jun 30, 2023

Describe the bug

My process is displaced by NVIDIA-SMI but not by gpustat.

Screenshots or Program Output

image

40901                       Fri Jun 30 11:52:07 2023  535.54.03
[0] NVIDIA GeForce RTX 4090 | 50°C,  73 % | 12894 / 24564 MB | user1(12874M) gdm(4M)
[1] NVIDIA GeForce RTX 4090 | 52°C,  49 % | 12894 / 24564 MB | user1(12874M) gdm(4M)
[2] NVIDIA GeForce RTX 4090 | 45°C, 100 % | 15202 / 24564 MB | user1(15182M) gdm(4M)
[3] NVIDIA GeForce RTX 4090 | 57°C,  99 % | 21305 / 24564 MB | user1(15180M) gdm(4M)
[4] NVIDIA GeForce RTX 4090 | 55°C,  71 % |  7444 / 24564 MB | user2(7418M) gdm(4M)
[5] NVIDIA GeForce RTX 4090 | 63°C,  99 % |  5330 / 24564 MB | user2(5304M) gdm(4M)
[6] NVIDIA GeForce RTX 4090 | 25°C,   0 % |    13 / 24564 MB | gdm(4M)
[7] NVIDIA GeForce RTX 4090 | 52°C,  52 % |  4652 / 24564 MB | user3(4632M) gdm(4M)

It can be observed that the PID 3617115 in GPU 3 is not displayed by gpustat, which occupies 6100 MiB memory.

Environment information:

  • OS: Ubuntu 22.04.2 LTS (GNU/Linux 5.19.0-45-generic x86_64)
  • NVIDIA Driver version: NVIDIA-SMI 535.54.03; Driver Version: 535.54.03; CUDA Version: 12.2
  • The name(s) of GPU card: NVIDIA GeForce RTX 4090
  • gpustat version: gpustat 1.1
  • pynvml version: nvidia-ml-py 11.525.112
@ryanxingql ryanxingql added the bug label Jun 30, 2023
@wookayin
Copy link
Owner

wookayin commented Jul 9, 2023

Do you see any useful message from gpustat --debug?

@ryanxingql
Copy link
Author

Do you see any useful message from gpustat --debug?

Hi,
The output messages of gpustat and gpustat --debug are the same.

@wookayin
Copy link
Owner

NVIDIA Drivers 535.xx are broken. 535.54 and 535.86 are affected versions, where nvmlProcessInfo_st has a wrong extra field (which now has been removed and reverted).

A workaround is to use nvidia-ml-py == 12.535.77 (must be a very specific version) for these buggy NVIDIA Drivers 535.54 and 535.86. Other versions (like the latest, nvidia-ml-py 12.535.108 as of now) will not report the correct process information.

Let me track this issue in #161.

@wookayin wookayin closed this as not planned Won't fix, can't repro, duplicate, stale Oct 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants