TF loads/initializes DevicePlugins multiple times if there are symlinked paths in sys.path. #55497
Labels
2.6.0
comp:runtime
c++ runtime, performance issues (cpu)
stat:awaiting tensorflower
Status - Awaiting response from tensorflower
type:bug
Bug
System information
Describe the current behavior
When TF starts up, the
__init__.py
checkssys.path
for paths to site-packages, that will then be used for loading PluggableDevices. If the identical file is accessible through a symlink from a different path, i.e.:.../lib/python3.8/site-packages/tensorflow-plugins/libmydevice.so
.../lib64/python3.8/site-packages/tensorflow-plugins/libmydevice.so
with
lib64
being a symlink tolib
(as it is the case for VENVs) then TensorFlow loads the library twice and dies withIn https://github.com/tensorflow/tensorflow/blob/master/tensorflow/c/c_api_experimental.cc#L724 a map with std::string is used, which cannot distinguish the symlink and therefore will call the initialization methods within the PluggableDevice library a second time.
Describe the expected behavior
The library does not get initialized twice.
Contributing
I think the easiest solution would be to change
TF_LoadPluggableDeviceLibrary
to use a set on the library handle instead of map, i.e.:LoadDynamicLibrary
usesdlopen
and will return ALWAYS the same handle independent of symlinks, see: https://man7.org/linux/man-pages/man3/dlopen.3.html. The call todlclose
is to decrease the reference count in case the lib got opened multiple times.Standalone code to reproduce the issue
We encountered this error when using a VENV with rh-python38 package, because it puts
lib
andlib64
intosys.path
. But can also be triggered by forging the PYTHONPATH env var:The text was updated successfully, but these errors were encountered: