Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

problem importing tensorflow with tensorflow-gpu pip package and Nvidia PRIME #9915

Closed
FedericoMuciaccia opened this issue May 15, 2017 · 7 comments
Labels
stat:awaiting response Status - Awaiting response from author stat:contribution welcome Status - Contributions welcome type:feature Feature requests

Comments

@FedericoMuciaccia
Copy link

System information

  • OS Platform and Distribution: Linux Ubuntu 16.10
  • TensorFlow installed from: binary
  • TensorFlow version: tensorflow-gpu-1.1.0
  • CUDA/cuDNN version:
  • GPU model and memory: GeForce 940MX 982MiB
  • Exact command to reproduce: import tensorflow

== cat /etc/issue ===============================================
Linux Lyn 4.8.0-51-generic #54-Ubuntu SMP Tue Apr 25 16:32:21 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
VERSION="16.10 (Yakkety Yak)"
VERSION_ID="16.10"
VERSION_CODENAME=yakkety

== are we in docker =============================================
No

== compiler =====================================================
c++ (Ubuntu 6.2.0-5ubuntu12) 6.2.0 20161005
Copyright (C) 2016 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

== uname -a =====================================================
Linux Lyn 4.8.0-51-generic #54-Ubuntu SMP Tue Apr 25 16:32:21 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

== check pips ===================================================
numpy (1.12.1)
numpydoc (0.6.0)
protobuf (3.3.0)
tensorflow-gpu (1.1.0)

== check for virtualenv =========================================
False

== tensorflow import ============================================
2017-05-15 16:22:31.009080: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
2017-05-15 16:22:31.009102: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2017-05-15 16:22:31.009124: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2017-05-15 16:22:31.009131: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
2017-05-15 16:22:31.009139: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
2017-05-15 16:22:31.119107: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:901] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2017-05-15 16:22:31.119494: I tensorflow/core/common_runtime/gpu/gpu_device.cc:887] Found device 0 with properties:
name: GeForce 940MX
major: 5 minor: 0 memoryClockRate (GHz) 0.993
pciBusID 0000:01:00.0
Total memory: 982.12MiB
Free memory: 675.25MiB
2017-05-15 16:22:31.119518: I tensorflow/core/common_runtime/gpu/gpu_device.cc:908] DMA: 0
2017-05-15 16:22:31.119526: I tensorflow/core/common_runtime/gpu/gpu_device.cc:918] 0: Y
2017-05-15 16:22:31.119542: I tensorflow/core/common_runtime/gpu/gpu_device.cc:977] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce 940MX, pci bus id: 0000:01:00.0)
tf.VERSION = 1.1.0
tf.GIT_VERSION = v1.1.0-rc0-61-g1ec6ed5
tf.COMPILER_VERSION = v1.1.0-rc0-61-g1ec6ed5
Sanity check: array([1], dtype=int32)

== env ==========================================================
LD_LIBRARY_PATH is unset
DYLD_LIBRARY_PATH is unset

== nvidia-smi ===================================================
Mon May 15 16:21:29 2017
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 375.39 Driver Version: 375.39 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce 940MX Off | 0000:01:00.0 Off | N/A |
| N/A 43C P0 N/A / N/A | 262MiB / 982MiB | 0% Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 1017 G /usr/lib/xorg/Xorg 168MiB |
| 0 1860 G /usr/bin/compiz 41MiB |
| 0 2324 G ...el-token=2DD3BDBDD08C58317A0131100BC13BC1 52MiB |
+-----------------------------------------------------------------------------+

== cuda libs ===================================================


tf.GIT_VERSION
v1.1.0-rc0-61-g1ec6ed5

tf.VERSION
1.1.0

Describe the problem

I have a laptop with a dedicated Nvidia GPU. I use it only for prototyping my tensorflow code.
But dedicated GPUs drain a lot of energy and reduce the laptop's battery life.
So when I'm outside on battery (eg: in the library at university) I always set Nvidia PRIME to use the integrated card only (type nvidia-settings in a console to reach this setting).
With the previous versions of tensorflow-gpu (installedi via pip3 on Ubuntu) everything worked well.
Now with the current release I can no longer use tensorflow-gpu while I have the Nvidia card disabled with PRIME.
Now, to be able to work with tensorflow AND have enough battery to conclude my day, I have to install the pip package "tensorflow" (and not "tensorflow-gpu"). but that turns useless if, for some reason, need to test my code with GPU acceleration, turning back on the dedicated graphic card via Nvidia PRIME.
If I really want GPU acceleration I have to re-enable the dedicated card in PRIME, uninstall tensorflow and reinstall tensorflow-gpu. every time. that's a mess!
To me, there are two way to resolve this bug:

  1. make tensorflow-gpu again able to handle situation with all the Nvidia GPU are temporarily disabled.
  2. unify the tensorflow and tensorflow-gpu packages, including a smart logic inside them that enable the software to handle theese ibrid situations, which I think will be very common in the laptops in the near future

Source code / logs

import tensorflow

ImportError Traceback (most recent call last)
/home/federico/.local/lib/python3.5/site-packages/tensorflow/python/pywrap_tensorflow.py in ()
40 sys.setdlopenflags(_default_dlopen_flags | ctypes.RTLD_GLOBAL)
---> 41 from tensorflow.python.pywrap_tensorflow_internal import *
42 from tensorflow.python.pywrap_tensorflow_internal import version

/home/federico/.local/lib/python3.5/site-packages/tensorflow/python/pywrap_tensorflow_internal.py in ()
27 return _mod
---> 28 _pywrap_tensorflow_internal = swig_import_helper()
29 del swig_import_helper

/home/federico/.local/lib/python3.5/site-packages/tensorflow/python/pywrap_tensorflow_internal.py in swig_import_helper()
23 try:
---> 24 _mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description)
25 finally:

/usr/lib/python3.5/imp.py in load_module(name, file, filename, details)
241 else:
--> 242 return load_dynamic(name, filename, file)
243 elif type_ == PKG_DIRECTORY:

/usr/lib/python3.5/imp.py in load_dynamic(name, path, file)
341 name=name, loader=loader, origin=path)
--> 342 return _load(spec)
343

ImportError: libnvidia-fatbinaryloader.so.375.39: cannot open shared object file: No such file or directory

During handling of the above exception, another exception occurred:

ImportError Traceback (most recent call last)
in ()
----> 1 import tensorflow

/home/federico/.local/lib/python3.5/site-packages/tensorflow/init.py in ()
22
23 # pylint: disable=wildcard-import
---> 24 from tensorflow.python import *
25 # pylint: enable=wildcard-import
26

/home/federico/.local/lib/python3.5/site-packages/tensorflow/python/init.py in ()
49 import numpy as np
50
---> 51 from tensorflow.python import pywrap_tensorflow
52
53 # Protocol buffers

/home/federico/.local/lib/python3.5/site-packages/tensorflow/python/pywrap_tensorflow.py in ()
50 for some common reasons and solutions. Include the entire stack trace
51 above this error message when asking for help.""" % traceback.format_exc()
---> 52 raise ImportError(msg)
53
54 # pylint: enable=wildcard-import,g-import-not-at-top,unused-import,line-too-long

ImportError: Traceback (most recent call last):
File "/home/federico/.local/lib/python3.5/site-packages/tensorflow/python/pywrap_tensorflow.py", line 41, in
from tensorflow.python.pywrap_tensorflow_internal import *
File "/home/federico/.local/lib/python3.5/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 28, in
_pywrap_tensorflow_internal = swig_import_helper()
File "/home/federico/.local/lib/python3.5/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 24, in swig_import_helper
_mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description)
File "/usr/lib/python3.5/imp.py", line 242, in load_module
return load_dynamic(name, filename, file)
File "/usr/lib/python3.5/imp.py", line 342, in load_dynamic
return _load(spec)
ImportError: libnvidia-fatbinaryloader.so.375.39: cannot open shared object file: No such file or directory

Failed to load the native TensorFlow runtime.

See https://www.tensorflow.org/install/install_sources#common_installation_problems

for some common reasons and solutions. Include the entire stack trace
above this error message when asking for help.

@ali01 ali01 added stat:contribution welcome Status - Contributions welcome type:feature Feature requests labels May 15, 2017
@jubjamie
Copy link
Contributor

jubjamie commented May 16, 2017

Please can you clarify. You are unable to load tensorflow-gpu when your graphics card is off. But you used to be able to? And you can run tensorflow-gpu ok when the GPU is on? I'm almost certain this is intentional as Tensorflow won't be able to access the GPU and won't function.

Can I suggest that you read through #9071 to see if it relates to you. I notice that you are on 16.10 however I have no issues using Tensorflow on that OS. The usual advice of checking drivers and dependencies applies, especially as you appear to be installing tensorflow-gpu frequently.

I'm sure a Tensorflower will be able to clarify however in the mean time may I suggest using some form of environment manager such as virtualenv or anaconda. This will allow you to have both tensorflow and tensorflow-gpu installed individually and allow you to swap between the two depending on whether you have your GPU on or not without having to uninstall and reinstall. This will allow you switch with relative ease.

EDIT: I forgot to mention that you can also use with tf.device('/cpu:0'): to run code only on the CPU. I guess it depends on how your code is set up and if it's easier to just switch Tensorflow instances. But that's 2X storage and updates.

@FedericoMuciaccia
Copy link
Author

  1. yes: with the old releases I was able to load tensorflow-gpu even if my Nvidia card was shutted down via PRIME. there were only some warnings about the failed loading of some CUDA stuff (which I cannot remember now exactly), but I was able to do computations correctly, without the need to install the "tensorflow" package.
  2. yes: with the current release (and with only tensorflow-gpu installed) I'm able to do computations only when I resume the Nvidia card via PRIME. if the dedicated card is disabled it's impossible to import the tensorflow module: import tensorflow fails.
  3. my OS didn't change. all the dependencies are automatically tracked by pip and all of them are updated to the latest available version in pip. I'm using 375.66 Nvidia drivers and it didn't change. the only thing changed is the tensorflow-gpu release installed (upgraded). I'm not too much deep into technical stuff, but ImportError: libnvidia-fatbinaryloader.so.375.51: cannot open shared object file #9071 doesn't seem immediately related to me, but I can be wrong. I cannot downgrade my Ubuntu installation now to do some tests.
  4. I will surely try the virtual environment workaround soon, but I think that the tensorflow framework must work out of the box, so I hope this bug will gain attention. I don't think that it should have the type:feature label, because this bug is a huge handicap for all the people using Nvidia PRIME (and also Nvidia Optimus?) to try to have a decent battery life in their laptops.
  5. I cannot run with tf.device('/cpu:0'): because I cannot define tf at all: import tensorflow as tf returns the error posted above.

thanks a lot for your time spent to answer me

@jubjamie
Copy link
Contributor

1/2. I'm quite surprised about this. Not sure that's intended. Perhaps tensorflow treated it as an insufficient card. It could be that an nVidia driver change now hides the card completely?

  1. Your OS shouldn't be an issue. I run on 16.10 with no issues. What version of CUDA/cuDNN are you using? I'm aware that some newer versions are not fully supported by tensorflow atm.

  2. I wouldn't quite call this a work around. This is a reccommend way of working with python and these kind of installs as it allows you to partially sandbox installs of things so that they dont break each other. Allows you to test a newer version of Tensorflow too and seeing if it will break anything.
    Saying that Tensorflow should work out of the box is a little bit tricky. Consider that Tensorflow is often compiled from source for certain platforms and having one binary that fits all might be a bit of a push.

Sorry this doesn't help you much but I think your solution is to use different versions of Tensorflow.

@gunan
Copy link
Contributor

gunan commented May 19, 2017

@FedericoMuciaccia Could you try setting LD_LIBRARY_PATH to point to where your CUDA and CUDNN so files are located?
You can find out more on how to set these in NVIDIA's CUDA and CUDNN installation documentation.

@gunan gunan added the stat:awaiting response Status - Awaiting response from author label May 19, 2017
@FedericoMuciaccia
Copy link
Author

FedericoMuciaccia commented May 20, 2017

@jubjamie

  • cuda: 8.0.44
  • cudnn: 5.1.5

@gunan
searching the web I've found that the correct way to set LD_LIBRARY_PATH in Ubuntu is this:

  • create a file /etc/ld.so.conf.d/cuda_things.conf
  • add the line /usr/lib/x86_64-linux-gnu inside it (where all my libs actually are)
  • run sudo ldconfig

whith all this done, the problem is still present

@aselle aselle removed the stat:awaiting response Status - Awaiting response from author label May 22, 2017
@gunan
Copy link
Contributor

gunan commented May 22, 2017

Did you try also adding /usr/lib/nvidia-375 to your LD_LIBRARY_PATH?

A quick google search shows some people ran into the exact same issue you are, and was able to resolve it:
https://stackoverflow.com/questions/42678439/importerror-libnvidia-fatbinaryloader-so-375-39-cannot-open-shared-object-file

@gunan gunan added the stat:awaiting response Status - Awaiting response from author label May 22, 2017
@girving
Copy link
Contributor

girving commented Jun 16, 2017

Automatically closing due to lack of recent activity. Please update the issue when new information becomes available, and we will reopen the issue. Thanks!

@girving girving closed this as completed Jun 16, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stat:awaiting response Status - Awaiting response from author stat:contribution welcome Status - Contributions welcome type:feature Feature requests
Projects
None yet
Development

No branches or pull requests

6 participants