CUDA 7.5 fails with pip install and docker (Ubuntu 14.04) #20

soumith · 2015-11-09T17:39:36Z

Installing via:

# For GPU-enabled version (only install this version if you have the CUDA sdk installed)
$ pip install https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow-0.5.0-cp27-none-linux_x86_64.whl

Tried to run the alexnet_benchmark.py and it's looking for CUDA 7.0 specifically.

I have CUDA 7.5 on my machine.

Full stack:

Traceback (most recent call last):
  File "alexnet_benchmark.py", line 21, in <module>
    import tensorflow.python.platform
  File "/home/awesomebox/anaconda/lib/python2.7/site-packages/tensorflow/__init__.py", line 4, in <module>
    from tensorflow.python import *
  File "/home/awesomebox/anaconda/lib/python2.7/site-packages/tensorflow/python/__init__.py", line 22, in <module>
    from tensorflow.python.client.client_lib import *
  File "/home/awesomebox/anaconda/lib/python2.7/site-packages/tensorflow/python/client/client_lib.py", line 35, in <module>
    from tensorflow.python.client.session import InteractiveSession
  File "/home/awesomebox/anaconda/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 11, in <module>
    from tensorflow.python import pywrap_tensorflow as tf_session
  File "/home/awesomebox/anaconda/lib/python2.7/site-packages/tensorflow/python/pywrap_tensorflow.py", line 28, in <module>
    _pywrap_tensorflow = swig_import_helper()
  File "/home/awesomebox/anaconda/lib/python2.7/site-packages/tensorflow/python/pywrap_tensorflow.py", line 24, in swig_import_helper
    _mod = imp.load_module('_pywrap_tensorflow', fp, pathname, description)
ImportError: libcudart.so.7.0: cannot open shared object file: No such file or directory

Tried the docker install, but the docker image is configured for a particular NVIDIA driver version, and doesn't work with others. (this is a known issue: docker driver version and system driver version must exactly match)

soumith · 2015-11-09T17:49:04Z

@nivwusquorum haha that's a terrible workaround, as it'll start issues with other libraries. Thanks a lot though. I'm installing CUDA 7.0

lukesimo · 2015-11-09T17:51:02Z

@nivwusquorum no.

I'm running into the same issue. Looks like I'll be downgrading then.

ebrevdo · 2015-11-09T18:25:27Z

Out of curiosity, have you set your LD_LIBRARY_PATH to your cuda installation's lib64 directory?

lukesimo · 2015-11-09T18:36:43Z

@ebrevdo yes

printenv LD_LIBRARY_PATH
/usr/local/cuda-7.5/lib64

mdda · 2015-11-09T18:45:05Z

On the subject of CUDA library versions ... CUDA 7.0 works for me (as expected), but it really insists on cuDNN 6.5 (which Nvidia now has as 'legacy').

Exact same library locations, etc, but downgrading from cuDNN 7.0 to 6.5 worked.

graphific · 2015-11-09T19:36:13Z

yes its within the tensorflow code, so just some simple python hacking wont solve it :)
(_pywrap_tensorflow.so when you pip install the binary):

_mod = imp.load_module('_pywrap_tensorflow', fp, pathname, description)
ImportError: libcudart.so.7.0: cannot open shared object file: No such file or directory

emergix · 2015-11-15T18:16:14Z

i assume i have same problem:
I have the 7.5 installed with tensorflow and when I try (like in the tutorial about gpu)
with tf.device('/gpu:0'):
a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
c = tf.matmul(a, b)
print(c)
sess.run(c)

it breaks !
(in torch7, I have no pbs with gpus)

jimaldon · 2015-11-17T05:40:20Z

Can we assume tensorflow to be forward compatible with cuda 7.5?

andorremus · 2015-11-17T17:11:02Z

I've got the same problem.

So are you saying that downgrading from cuda 7.5 to 7 should do the trick?

andorremus · 2015-11-17T20:33:53Z

If it helps, I've installed cuda toolkit 7.0 and changed the bash profile reference to the new one and it works.

emergix · 2015-11-17T20:51:17Z

yes you do not need to reinstall the cuda 7.0 driver, just provide the path to libcudart.so.7.0 at the end of the LD_LIBRARY_PATH variable. After discussing of it with someone of the team, they told me a strange story. It appears thar the old CUDA drivers are very much in demand by the people using AWS of amazon !
can someone confirm ?

ebrevdo · 2015-11-23T04:25:13Z

Long story short: tensorflow currently requires cuda 7.0. If you install version 7.0 in a separate directory from 7.5, and point tensorflow at it via the configure script (or LD_LIBRARY_PATH), it will work. Leaving this open to track future upgrades to the 7.5 SDK.

FabHan · 2015-12-08T11:45:40Z

I'm curious why it is hard to upgrade to CUDA 7.5 and CuDNN v3? Anyone helps me understand?

pannous · 2015-12-09T09:04:34Z

@emergix so you got it to work just by providing libcudart.so.7.0 without reinstalling / downgrading to old cuda?

andorremus · 2015-12-09T09:07:56Z

Yes. It doesn't require you to uninstall the previous one. You can just install them separately and reference v7.0 in your bashrc file

pannous · 2015-12-09T10:36:10Z

Thanks! Workaround confirmed here.

fivejjs · 2016-01-04T13:23:07Z

symbolic link libcuda_.7.5 to libcuda_.7.0
It works.

esube · 2016-01-14T15:58:43Z

This issue and co. are open for more than two months now. Is there any progress to support 7.5 cuda and 7.0 cudnn other than the workarounds? This could be a turn off for some people who already have been working with 7.5 cuda for a while.

kmhofmann · 2016-01-14T17:08:07Z

I completely agree. Ideally (assuming reasonable API/ABI stability on NVIDIA's side), TensorFlow should not be dependent on specific older versions of CUDA and cuDNN. (I'd rather understand it if the latest version was required to make use of certain features.)
This is the one issue that puts me off using TensorFlow with GPU support.

cmcneil · 2016-01-16T23:26:26Z

+1 Please support CUDA 7.5

ville-k · 2016-01-16T23:51:58Z

If you want to try out CUDA 7.5 under Linux, you could try building my pull request branch:
#664
It adds support for CUDA on OSX and uses CUDA 7.5 when building under OSX. I haven't tried 7.5 under Linux, but it seems to work ok with OSX.
The only change you'd need to make is to edit the configure file and set CUDA_VERSION='7.5' when Linux is detected. Lines 48-49 would look like:

if [ "$OSNAME" == "Linux" ]; then
  CUDA_VERSION='7.5'

Add use_explicit_batch parameter available in OpConverterParams and other places Formatting and make const bool everywhere Enable use_explicit_batch for TRT 6.0 Revise validation checks to account for use_explicit_batch. Propagate flag to ConversionParams and TRTEngineOp Rename use_explicit_batch/use_implicit_batch Formatting Add simple activtion test for testing dynamic input shapes. Second test with None dims is disabled Update ConvertAxis to account for use_implicit batch fix use of use_implicit_batch (tensorflow#7) * fix use of use_implicit_batch * change order of parameters in ConvertAxis function fix build (tensorflow#8) Update converters for ResNet50 (except Binary ops) (tensorflow#9) * Update RN50 converters for use_implicit_batch: Conv2D, BiasAdd, Transpose, MaxPool, Squeeze, MatMul, Pad * Fix compilation errors * Fix tests Use TRT6 API's for dynamic shape (tensorflow#11) * adding changes for addnetworkv2 * add plugin utils header file in build * optimization profile api added * fix optimization profile * TRT 6.0 api changes + clang format * Return valid errors in trt_engine_op * add/fix comments * Changes to make sure activation test passes with TRT trunk * use HasStaticShape API, add new line at EOF Allow opt profiles to be set via env variables temporarily. Undo accidental change fix segfault by properly returning the status from OverwriteStaticDims function Update GetTrtBroadcastShapes for use_implicit_batch (tensorflow#14) * Update GetTrtBroadcastShapes for use_implicit_batch * Formatting Update activation test Fix merge errors Update converter for reshape (tensorflow#17) Allow INT32 for elementwise (tensorflow#18) Add Shape op (tensorflow#19) * Add Shape op * Add #if guards for Shape. Fix formatting Support dynamic shapes for strided slice (tensorflow#20) Support dynamic shapes for strided slice Support const scalars + Pack on constants (tensorflow#21) Support const scalars and pack with constants in TRT6 Fixes/improvements for BERT (tensorflow#22) * Support shrink_axis_mask for StridedSlice * Use a pointer for final_shape arg in ConvertStridedSliceHelper. Use final_shape for unpack/unstack * Support BatchMatMulV2. * Remove TODO and update comments * Remove unused include * Update Gather for TRT 6 * Update BatchMatMul for TRT6 - may need more changes * Update StridedSlice shrink_axis for TRT6 * Fix bugs with ConvertAxis, StridedSlice shrink_axis, Gather * Fix FC and broadcast * Compile issue and matmul fix * Use nullptr for empty weights * Update Slice * Fix matmul for TRT6 * Use enqueueV2. Don't limit to 1 input per engine Change INetworkConfig to IBuilderConfig Allow expand dims to work on dynamic inputs by slicing shape. Catch problems with DepthwiseConv. Don't try to verify dynamic shapes in CheckValidSize (tensorflow#24) Update CombinedNMS converter (tensorflow#23) * Support CombinedNMS in non implicit batch mode. The squeeze will not work if multiple dimensions are unknown * Fix compile error and formatting Support squeeze when input dims are unknown Support an additional case of StridedSlice where some dims aren't known Use new API for createNetworkV2 Fix flag type for createNetworkV2 Use tensor inputs for strided slice Allow squeeze to work on -1 dims Add TRT6 checks to new API spliting ConvertGraphDefToEngine (tensorflow#29) * spliting ConvertGraphDefToEngine into ConvertGraphDefToNetwork and BuildEngineFromNetwork * some compiler error * fix format Squeeze Helper function (tensorflow#31) * Add squeeze helper * Fix compile issues * Use squeeze helper for CombinedNMS Update Split & Unpack for dynamic shapes (tensorflow#32) * Update Unpack for dynamic shapes * Fix compilation error Temporary hack to fix bug in config while finding TRT library Fix errors from rebasing Remove GatherV2 limitations for TRT6 Fix BiasAdd elementwise for NCHW case with explicit batch mode (tensorflow#34) Update TRT6 headers, Make tests compile (tensorflow#35) * Change header files for TRT6 in configure script * Fix bug with size of scalars. Use implicit batch mode based on the converter flag when creating network * Fix compilation of tests and Broadcast tests Properly fix biasadd nchw (tensorflow#36) Revert tensorflow#29 to fix weight corruption (tensorflow#37) * Revert tensorflow#29 to fix weight corruption * Revert change in test Fix bug with converters and get all tests passing for TRT6 (tensorflow#39) Update DepthToSpace and SpaceToTest for TRT6 + dynamic shapes (tensorflow#40) Add new C++ tests for TRT6 converters (tensorflow#41) * Remove third shuffle layer since bug with transpose was fixed * Add new tests for TRT6 features * Update TRT6 headers list Fix compilation errors Remove bazel_build.sh Enable quantization mnist test back Disabled by mistake I believe Remove undesirable changes in quantization_mnist_test Add code back that was missed during rebase Fix bug: change "type" to type_key

Merge with upstream

soumith changed the title ~~CUDA 7.5 fails with pip install and docker~~ CUDA 7.5 fails with pip install and docker (Ubuntu 14.04) Nov 9, 2015

teamdandelion added the installation/startup label Nov 9, 2015

keveman added the cuda label Nov 9, 2015

This was referenced Nov 11, 2015

CUDA 7.0 is hard-coded in configure script for Linux #131

Closed

Can anyone install it with cuda7.5 and cudnn 7.0? #125

Closed

This was referenced Nov 24, 2015

TensorFlow with Cuda 7.5 on Ubuntu 15.4? #318

Closed

Support cuda 7.5 and cudnn 7.0 #54

Closed

ruffsl mentioned this issue Dec 3, 2015

Official Tensorflow Docker Image #149

Closed

martinwicke assigned zheng-xq Jan 14, 2016

junshi15 mentioned this issue Aug 9, 2017

tf_cnn_benchmarks.py stuck when running with multiple GPUs and ImageNet data with protocol grpc+verbs #11725

Closed

jakiechris mentioned this issue Sep 4, 2017

protobuf crashes at runtime when loading tensor lib. #12794

Closed

zhangbo5891001 mentioned this issue Nov 29, 2017

[BUG]Out-of-Bounds Read in DecodeBmpOp class(tensorflow/core/kernels/decode_bmp_op.cc) #14959

Closed

vlad17 mentioned this issue Feb 23, 2018

CPU restrictions do not reduce thread count #17206

Closed

samhodge mentioned this issue May 22, 2018

Unknown TF crash on OSX in C++ Application, works fine on another machine, other operating systems #19426

Closed

ychen404 mentioned this issue Aug 19, 2018

Not able to port a 6-layered mobilenet tflite model to mobile #21368

Closed

luochao436 mentioned this issue Oct 9, 2018

tensorflow c++ api session->Run segv for batch size>1 #22827

Closed

cabahug2 mentioned this issue Nov 12, 2018

Object detection demo crashing using custom model #23689

Closed

chenjiasheng mentioned this issue Dec 12, 2018

Distributed Training Randomly Stops During the Training Process #12667

Closed

lorenzoriano mentioned this issue Jan 11, 2019

BUS Error, likely with blas #24844

Closed

isra60 mentioned this issue Mar 25, 2019

Segmentation Fault with TensorRT create interference graph #27100

Closed

dkashkin mentioned this issue Apr 25, 2019

TFLite Interpreter fails to load quantized model on Android (stock ssd_mobilenet_v2) #28163

Closed

chengdianxuezi mentioned this issue Nov 1, 2019

Bug: tensorflow-gpu takes long time before beginning to compute #18652

Closed

arielbenitah mentioned this issue Nov 10, 2019

TensorRT Segmentation Fault During Conversion #34136

Closed

yanceyblog mentioned this issue Nov 28, 2019

armeabi-v7a libtensorflowlite_jni.so：signal 7 (SIGBUS), code 1 (BUS_ADRALN), fault addr 0xeef5445f #34669

Closed

yhliang2018 mentioned this issue Feb 3, 2020

High memory consumption with model.fit in TF 2.0.0 and 2.1.0-rc0 #35030

Closed

liuyibox mentioned this issue Jul 2, 2020

tf.estimator.predict cannot run consecutively on Colab TPU #41014

Closed

This was referenced Sep 24, 2020

Didnt find op for builtin opcode 'RESIZE_NEAREST_NEIGHBOR' version '3' #43291

Closed

null pointer dereference Error in TF2.3.0 with runforMultipleInputOutput #43657

Closed

This was referenced Nov 2, 2020

Undefined symbols for architecture arm64 when loading TensorFlowLiteSelectTfOps on iOS device #41948

Closed

crashed at TfLiteInterpreterCreate #44513

Closed

keithm-xmos referenced this issue in xmos/tensorflow Feb 1, 2021

Merge pull request #20 from xmos/master

50296a0

Merge with upstream

dinkdeep mentioned this issue Apr 7, 2021

Segmentation fault in tf-opt while running a tf dialect mlir file #48365

Open

rsanthanam-amd mentioned this issue Jul 1, 2021

[ROCm] This change replaces the original assert for detecting multiple #49232

Closed

DavidvSon1 mentioned this issue Oct 3, 2021

Segmentation fault when invoking TFLite interpreter on basic quantized model tensorflow/model-optimization#857

Open

goddie1 mentioned this issue Mar 10, 2022

tensorflow core while do stdthread create #55186

Closed

ivankxt mentioned this issue Jun 12, 2023

Get deadlock after Predict(cuda10.0, cudnn7.6.5, Tesla T4 GPU) #60841

Closed

lyz1005 mentioned this issue Oct 26, 2023

Interpreter run crash #62240

Closed

spacycoder mentioned this issue Dec 11, 2023

Why does my full integer quantized tflite model crash when loaded? #62618

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUDA 7.5 fails with pip install and docker (Ubuntu 14.04) #20

CUDA 7.5 fails with pip install and docker (Ubuntu 14.04) #20

soumith commented Nov 9, 2015

soumith commented Nov 9, 2015

lukesimo commented Nov 9, 2015

ebrevdo commented Nov 9, 2015

lukesimo commented Nov 9, 2015

mdda commented Nov 9, 2015

graphific commented Nov 9, 2015

emergix commented Nov 15, 2015

jimaldon commented Nov 17, 2015

andorremus commented Nov 17, 2015

andorremus commented Nov 17, 2015

emergix commented Nov 17, 2015

ebrevdo commented Nov 23, 2015

FabHan commented Dec 8, 2015

pannous commented Dec 9, 2015

andorremus commented Dec 9, 2015

pannous commented Dec 9, 2015

fivejjs commented Jan 4, 2016

esube commented Jan 14, 2016

kmhofmann commented Jan 14, 2016

cmcneil commented Jan 16, 2016

ville-k commented Jan 16, 2016

CUDA 7.5 fails with pip install and docker (Ubuntu 14.04) #20

CUDA 7.5 fails with pip install and docker (Ubuntu 14.04) #20

Comments

soumith commented Nov 9, 2015

soumith commented Nov 9, 2015

lukesimo commented Nov 9, 2015

ebrevdo commented Nov 9, 2015

lukesimo commented Nov 9, 2015

mdda commented Nov 9, 2015

graphific commented Nov 9, 2015

emergix commented Nov 15, 2015

jimaldon commented Nov 17, 2015

andorremus commented Nov 17, 2015

andorremus commented Nov 17, 2015

emergix commented Nov 17, 2015

ebrevdo commented Nov 23, 2015

FabHan commented Dec 8, 2015

pannous commented Dec 9, 2015

andorremus commented Dec 9, 2015

pannous commented Dec 9, 2015

fivejjs commented Jan 4, 2016

esube commented Jan 14, 2016

kmhofmann commented Jan 14, 2016

cmcneil commented Jan 16, 2016

ville-k commented Jan 16, 2016