Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tf.data.Dataset.list_files result in segfault #44878

Closed
gnthibault opened this issue Nov 14, 2020 · 2 comments
Closed

tf.data.Dataset.list_files result in segfault #44878

gnthibault opened this issue Nov 14, 2020 · 2 comments
Assignees
Labels
comp:data tf.data related issues type:bug Bug

Comments

@gnthibault
Copy link

gnthibault commented Nov 14, 2020

System information

  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow): Not really
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 20.04
  • Mobile device (e.g. iPhone 8, Pixel 2, Samsung Galaxy) if the issue happens on mobile device: No
  • TensorFlow installed from (source or binary):from pip install tf-nightly
  • TensorFlow version (use command below):tf-nightly==2.5.0.dev20201114
  • Python version:3.7.0
  • Bazel version (if compiling from source):
  • GCC/Compiler version (if compiling from source):
  • CUDA/cuDNN version: Irrelevant
  • GPU model and memory: GTX680, compute capability too low to be used (3.0)

You can collect some of this information using our environment capture
script
You can also obtain the TensorFlow version with:

  1. TF 2.0: python -c "import tensorflow as tf; print(tf.version.GIT_VERSION, tf.version.VERSION)"
    v1.12.1-45796-gbefea92a3d 2.5.0-dev20201114

Describe the current behavior

In [1]: import tensorflow as tf
2020-11-14 21:20:53.955783: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: 
2020-11-14 21:20:53.955823: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.

In [2]: list_ds = tf.data.Dataset.list_files("/")
2020-11-14 21:20:58.154084: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set
2020-11-14 21:20:58.155020: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcuda.so.1
2020-11-14 21:20:58.193006: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-11-14 21:20:58.193331: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1724] Found device 0 with properties: 
pciBusID: 0000:01:00.0 name: GeForce GTX 680 computeCapability: 3.0
coreClock: 1.176GHz coreCount: 8 deviceMemorySize: 1.95GiB deviceMemoryBandwidth: 179.05GiB/s
2020-11-14 21:20:58.193434: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: 
2020-11-14 21:20:58.193503: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcublas.so.11'; dlerror: libcublas.so.11: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: 
2020-11-14 21:20:58.193567: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcublasLt.so.11'; dlerror: libcublasLt.so.11: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: 
2020-11-14 21:20:58.195053: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10
2020-11-14 21:20:58.195400: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10
2020-11-14 21:20:58.197271: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.so.10
2020-11-14 21:20:58.197427: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcusparse.so.11'; dlerror: libcusparse.so.11: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: 
2020-11-14 21:20:58.197525: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcudnn.so.8'; dlerror: libcudnn.so.8: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: 
2020-11-14 21:20:58.197545: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1761] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
2020-11-14 21:20:58.198519: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set
2020-11-14 21:20:58.198566: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1265] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-11-14 21:20:58.198579: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1271]      
Erreur de segmentation (core dumped)

Describe the expected behavior

Would just work and produce a dataset object

Standalone code to reproduce the issue
Provide a reproducible test case that is the bare minimum necessary to generate
the problem. If possible, please share a link to Colab/Jupyter/any notebook.

import tensorflow as tf 
list_ds = tf.data.Dataset.list_files("/")

Other info / logs Include any logs or source code that would be helpful to
diagnose the problem. If including tracebacks, please include the full
traceback. Large logs and files should be attached.

I initially thought I could use my GTX680 GPU, so I installed cuda libraries and cudnn with tensorflow-gpu. When I realized compute capability was too low, I just intalled tensorflow, in the hope that I would not experience any gpu related issue. Eventually I still got errors.

I also tried a bit of gdb:

Thread 30 "python" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fff2b7fe700 (LWP 561116)]
__strnlen_sse2 () at ../sysdeps/x86_64/multiarch/../strlen.S:117

backtrace gives:
#0 __strnlen_sse2 () at ../sysdeps/x86_64/multiarch/../strlen.S:117
#1 0x00007ffff7e7f1a6 in __fnmatch (pattern=, string=, flags=1) at fnmatch.c:342
#2 0x00007fffd5869ff7 in tensorflow::FileSystem::Match(std::string const&, std::string const&) ()
from /home/gnthibault/Documents/tests/python/tf_install/venv/lib/python3.7/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so
#3 0x00007fffd586f2e4 in std::_Function_handler<void (int), tensorflow::internal::GetMatchingPaths(tensorflow::FileSystem*, tensorflow::Env*, std::string const&, std::vector<std::string, std::allocatorstd::string >)::{lambda(int)#1}::operator()(int) const::{lambda(int)#1}>::_M_invoke(std::_Any_data const&, int&&) ()
from /home/gnthibault/Documents/tests/python/tf_install/venv/lib/python3.7/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so
#4 0x00007fffd586ed62 in std::_Function_handler<void (), tensorflow::internal::(anonymous namespace)::ForEach(int, int, std::function<void (int)> const&)::{lambda()#1}>::_M_invoke(std::_Any_data const&) () from /home/gnthibault/Documents/tests/python/tf_install/venv/lib/python3.7/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so
#5 0x00007fffd5876021 in Eigen::ThreadPoolTempltensorflow::thread::EigenEnvironment::WorkerLoop(int) ()
from /home/gnthibault/Documents/tests/python/tf_install/venv/lib/python3.7/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so
#6 0x00007fffd5872d33 in std::_Function_handler<void (), tensorflow::thread::EigenEnvironment::CreateThread(std::function<void ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) ()
from /home/gnthibault/Documents/tests/python/tf_install/venv/lib/python3.7/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so
#7 0x00007fffd1008335 in tensorflow::(anonymous namespace)::PThread::ThreadFn(void
) ()
from /home/gnthibault/Documents/tests/python/tf_install/venv/lib/python3.7/site-packages/tensorflow/python/../libtensorflow_framework.so.2
#8 0x00007ffff7f8b609 in start_thread (arg=) at pthread_create.c:477
#9 0x00007ffff7eb2293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

@google-ml-butler
Copy link

Are you satisfied with the resolution of your issue?
Yes
No

@gnthibault
Copy link
Author

installed tf-nightly-cpu that fixed the issue

@Saduf2019 Saduf2019 added the comp:data tf.data related issues label Nov 17, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp:data tf.data related issues type:bug Bug
Projects
None yet
Development

No branches or pull requests

2 participants