Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pynvml not support lookup process info #105

Closed
hstk30 opened this issue Jul 22, 2021 · 12 comments
Closed

pynvml not support lookup process info #105

hstk30 opened this issue Jul 22, 2021 · 12 comments
Assignees

Comments

@hstk30
Copy link

hstk30 commented Jul 22, 2021

When I call nvmlDeviceGetGraphicsRunningProcesses, raise below exception.

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
~/gitProject/venv/siren/lib64/python3.6/site-packages/pynvml/nvml.py in _nvmlGetFunctionPointer(name)
    759         try:
--> 760             _nvmlGetFunctionPointer_cache[name] = getattr(nvmlLib, name)
    761             return _nvmlGetFunctionPointer_cache[name]

/usr/lib64/python3.6/ctypes/__init__.py in __getattr__(self, name)
    355             raise AttributeError(name)
--> 356         func = self.__getitem__(name)
    357         setattr(self, name, func)

/usr/lib64/python3.6/ctypes/__init__.py in __getitem__(self, name_or_ordinal)
    360     def __getitem__(self, name_or_ordinal):
--> 361         func = self._FuncPtr((name_or_ordinal, self))
    362         if not isinstance(name_or_ordinal, int):

AttributeError: /lib64/libnvidia-ml.so.1: undefined symbol: nvmlDeviceGetGraphicsRunningProcesses_v2

During handling of the above exception, another exception occurred:

NVMLError_FunctionNotFound                Traceback (most recent call last)
<ipython-input-5-6d9d0902fdc2> in <module>
----> 1 nvmlDeviceGetGraphicsRunningProcesses(handle)

~/gitProject/venv/hstk/lib64/python3.6/site-packages/pynvml/nvml.py in nvmlDeviceGetGraphicsRunningProcesses(handle)
   2179
   2180 def nvmlDeviceGetGraphicsRunningProcesses(handle):
-> 2181     return nvmlDeviceGetGraphicsRunningProcesses_v2(handle)
   2182
   2183 def nvmlDeviceGetAutoBoostedClocksEnabled(handle):

~/gitProject/venv/hstk/lib64/python3.6/site-packages/pynvml/nvml.py in nvmlDeviceGetGraphicsRunningProcesses_v2(handle)
   2147     # first call to get the size
   2148     c_count = c_uint(0)
AttributeError                            Traceback (most recent call last)
~/gitProject/venv/hstk/lib64/python3.6/site-packages/pynvml/nvml.py in _nvmlGetFunctionPointer(name)
    759         try:
--> 760             _nvmlGetFunctionPointer_cache[name] = getattr(nvmlLib, name)
    761             return _nvmlGetFunctionPointer_cache[name]

/usr/lib64/python3.6/ctypes/__init__.py in __getattr__(self, name)
    355             raise AttributeError(name)
--> 356         func = self.__getitem__(name)
    357         setattr(self, name, func)

/usr/lib64/python3.6/ctypes/__init__.py in __getitem__(self, name_or_ordinal)
    360     def __getitem__(self, name_or_ordinal):
--> 361         func = self._FuncPtr((name_or_ordinal, self))
    362         if not isinstance(name_or_ordinal, int):

AttributeError: /lib64/libnvidia-ml.so.1: undefined symbol: nvmlDeviceGetGraphicsRunningProcesses_v2

During handling of the above exception, another exception occurred:

NVMLError_FunctionNotFound                Traceback (most recent call last)
<ipython-input-6-85e61951ad1d> in <module>
----> 1 nvmlDeviceGetGraphicsRunningProcesses_v2(handle)

~/gitProject/venv/hstk/lib64/python3.6/site-packages/pynvml/nvml.py in nvmlDeviceGetGraphicsRunningProcesses_v2(handle)
   2147     # first call to get the size
   2148     c_count = c_uint(0)
-> 2149     fn = _nvmlGetFunctionPointer("nvmlDeviceGetGraphicsRunningProcesses_v2")
   2150     ret = fn(handle, byref(c_count), None)
   2151

~/gitProject/venv/hstk/lib64/python3.6/site-packages/pynvml/nvml.py in _nvmlGetFunctionPointer(name)
    761             return _nvmlGetFunctionPointer_cache[name]
    762         except AttributeError:
--> 763             raise NVMLError(NVML_ERROR_FUNCTION_NOT_FOUND)
    764     finally:
    765         # lock is always freed

NVMLError_FunctionNotFound: Function Not Found

So, I guess may be is the pynvml change something lead to this problem
#72

@Stonesjtu
Copy link
Collaborator

Can you provide your pynvml version and Nvidia-smi output.

@Stonesjtu Stonesjtu self-assigned this Jul 22, 2021
@hstk30
Copy link
Author

hstk30 commented Jul 22, 2021

Below is nvidia-smi and gpustat output, and my pynvml is 11.0.0

nvidia-smi output

gpustat output

@wookayin
Copy link
Owner

The package pynvml is implemented by many implementations: we were using nvidia-ml-py3 but it seems that you are using a different one. Can you please try installing nvidia-ml-py3 and see if it works? We will move to the official binding, in which case such conflicts might be gone.

@hstk30
Copy link
Author

hstk30 commented Jul 30, 2021

image

@wookayin I installed surely nvidia-ml-py3, still not work

@wookayin
Copy link
Owner

wookayin commented Jul 30, 2021

Your pynvml seems okay. So the following doesn't work for you right?

import pynvml
pynvml.nvmlInit()
pynvml._nvmlGetFunctionPointer("nvmlDeviceGetGraphicsRunningProcesses_v2")

Your shared library reads /lib64/libnvidia-ml.so.1 which sounds suspicious. Mine is /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.1 (symlinked to ..so.4XX.XX) How did you install nvidia drivers?

Can you please provide an output of ldconfig -p | grep nvidia-ml? Have you tried running ldconfig? Do you have libnvidia-ml.so in /usr/lib/x86_64-linux-gnu?

@hstk30
Copy link
Author

hstk30 commented Aug 2, 2021

image

This is my ldconfig -p | grep nvidia-ml result.
image

Maybe, my OS is centos so the dirctory is different?

@hstk30
Copy link
Author

hstk30 commented Aug 2, 2021

Oh, I have two python virtual environment, one work well, another is the mentioned above.

@wookayin
Copy link
Owner

wookayin commented Aug 2, 2021

In [4]: pynvml
<module 'pynvml' from '........................./site-packages/pynvml.py'>

In that environment you probably have a broken, wrong pynvml installed. /lib64/libnvidia-ml.so is probably fine if yours is CentOS.

@hstk30
Copy link
Author

hstk30 commented Aug 3, 2021

I find the problem, in my work well venv, the pynvml is just a file in

In [1]: pynvml
<module 'pynvml' from '........................./lib64/python3.6/site-packages/pynvml.py'>

and the failed work venv, the pynvml is a python package

In [1]: pynvml
<module 'pynvml' from '........................./lib64/python3.6/site-packages/pynvml/__init__.py'>

So, I just remove the pynvml package, and copy the pynvml.py file to site-packages directory, then gpustat is work well.

But, I don't know why there have this different, I'm sure I use pip to install package by regulation, should not occur this different.

@hstk30 hstk30 closed this as completed Aug 3, 2021
@wookayin
Copy link
Owner

wookayin commented Aug 3, 2021

@hstk30 Thanks for the information. pynvml is expected to be a single-file module if gpustat has been installed properly. Maybe other packages you installed have a dependency that conflicts with pynvml.

I am curious what they are -- could you provide how you accidentally installed those? Maybe this one (which is a third-party fork)? I am going to add some warning messages in case an incompatible pynvml is found.

BTW,

and copy the pynvml.py file to site-packages directory,

Actually you can re-install gpustat or do pip install -I nvidia-ml-py3 to reinstall pynvml.

@hstk30
Copy link
Author

hstk30 commented Aug 4, 2021

I guess maybe is Hanlp depends pynvml, I'm not sure.

@wookayin
Copy link
Owner

wookayin commented Aug 4, 2021

@hstk30 That seems correct. It should never use pynvml as a dependency; actually this package should never have existed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants