tf.data.Dataset.list_files result in segfault #44878

gnthibault · 2020-11-14T20:31:40Z

System information

Have I written custom code (as opposed to using a stock example script provided in TensorFlow): Not really
OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 20.04
Mobile device (e.g. iPhone 8, Pixel 2, Samsung Galaxy) if the issue happens on mobile device: No
TensorFlow installed from (source or binary):from pip install tf-nightly
TensorFlow version (use command below):tf-nightly==2.5.0.dev20201114
Python version:3.7.0
Bazel version (if compiling from source):
GCC/Compiler version (if compiling from source):
CUDA/cuDNN version: Irrelevant
GPU model and memory: GTX680, compute capability too low to be used (3.0)

You can collect some of this information using our environment capture
script
You can also obtain the TensorFlow version with:

TF 2.0: python -c "import tensorflow as tf; print(tf.version.GIT_VERSION, tf.version.VERSION)"
v1.12.1-45796-gbefea92a3d 2.5.0-dev20201114

Describe the current behavior

In [1]: import tensorflow as tf
2020-11-14 21:20:53.955783: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: 
2020-11-14 21:20:53.955823: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.

In [2]: list_ds = tf.data.Dataset.list_files("/")
2020-11-14 21:20:58.154084: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set
2020-11-14 21:20:58.155020: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcuda.so.1
2020-11-14 21:20:58.193006: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-11-14 21:20:58.193331: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1724] Found device 0 with properties: 
pciBusID: 0000:01:00.0 name: GeForce GTX 680 computeCapability: 3.0
coreClock: 1.176GHz coreCount: 8 deviceMemorySize: 1.95GiB deviceMemoryBandwidth: 179.05GiB/s
2020-11-14 21:20:58.193434: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: 
2020-11-14 21:20:58.193503: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcublas.so.11'; dlerror: libcublas.so.11: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: 
2020-11-14 21:20:58.193567: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcublasLt.so.11'; dlerror: libcublasLt.so.11: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: 
2020-11-14 21:20:58.195053: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10
2020-11-14 21:20:58.195400: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10
2020-11-14 21:20:58.197271: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.so.10
2020-11-14 21:20:58.197427: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcusparse.so.11'; dlerror: libcusparse.so.11: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: 
2020-11-14 21:20:58.197525: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcudnn.so.8'; dlerror: libcudnn.so.8: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: 
2020-11-14 21:20:58.197545: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1761] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
2020-11-14 21:20:58.198519: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set
2020-11-14 21:20:58.198566: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1265] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-11-14 21:20:58.198579: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1271]      
Erreur de segmentation (core dumped)

Describe the expected behavior

Would just work and produce a dataset object

Standalone code to reproduce the issue
Provide a reproducible test case that is the bare minimum necessary to generate
the problem. If possible, please share a link to Colab/Jupyter/any notebook.

import tensorflow as tf 
list_ds = tf.data.Dataset.list_files("/")

Other info / logs Include any logs or source code that would be helpful to
diagnose the problem. If including tracebacks, please include the full
traceback. Large logs and files should be attached.

I initially thought I could use my GTX680 GPU, so I installed cuda libraries and cudnn with tensorflow-gpu. When I realized compute capability was too low, I just intalled tensorflow, in the hope that I would not experience any gpu related issue. Eventually I still got errors.

I also tried a bit of gdb:

Thread 30 "python" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fff2b7fe700 (LWP 561116)]
__strnlen_sse2 () at ../sysdeps/x86_64/multiarch/../strlen.S:117

backtrace gives:
#0 __strnlen_sse2 () at ../sysdeps/x86_64/multiarch/../strlen.S:117
#1 0x00007ffff7e7f1a6 in __fnmatch (pattern=, string=, flags=1) at fnmatch.c:342
#2 0x00007fffd5869ff7 in tensorflow::FileSystem::Match(std::string const&, std::string const&) ()
from /home/gnthibault/Documents/tests/python/tf_install/venv/lib/python3.7/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so
#3 0x00007fffd586f2e4 in std::_Function_handler<void (int), tensorflow::internal::GetMatchingPaths(tensorflow::FileSystem*, tensorflow::Env*, std::string const&, std::vector<std::string, std::allocatorstd::string >)::{lambda(int)#1}::operator()(int) const::{lambda(int)#1}>::_M_invoke(std::_Any_data const&, int&&) ()
from /home/gnthibault/Documents/tests/python/tf_install/venv/lib/python3.7/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so
#4 0x00007fffd586ed62 in std::_Function_handler<void (), tensorflow::internal::(anonymous namespace)::ForEach(int, int, std::function<void (int)> const&)::{lambda()#1}>::_M_invoke(std::_Any_data const&) () from /home/gnthibault/Documents/tests/python/tf_install/venv/lib/python3.7/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so
#5 0x00007fffd5876021 in Eigen::ThreadPoolTempltensorflow::thread::EigenEnvironment::WorkerLoop(int) ()
from /home/gnthibault/Documents/tests/python/tf_install/venv/lib/python3.7/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so
#6 0x00007fffd5872d33 in std::_Function_handler<void (), tensorflow::thread::EigenEnvironment::CreateThread(std::function<void ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) ()
from /home/gnthibault/Documents/tests/python/tf_install/venv/lib/python3.7/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so
#7 0x00007fffd1008335 in tensorflow::(anonymous namespace)::PThread::ThreadFn(void) ()
from /home/gnthibault/Documents/tests/python/tf_install/venv/lib/python3.7/site-packages/tensorflow/python/../libtensorflow_framework.so.2
#8 0x00007ffff7f8b609 in start_thread (arg=) at pthread_create.c:477
#9 0x00007ffff7eb2293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

The text was updated successfully, but these errors were encountered:

google-ml-butler · 2020-11-14T20:48:46Z

Are you satisfied with the resolution of your issue?
Yes
No

gnthibault · 2020-11-14T20:49:03Z

installed tf-nightly-cpu that fixed the issue

gnthibault added the type:bug Bug label Nov 14, 2020

google-ml-butler bot assigned Saduf2019 Nov 14, 2020

gnthibault closed this as completed Nov 14, 2020

Saduf2019 added the comp:data tf.data related issues label Nov 17, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tf.data.Dataset.list_files result in segfault #44878

tf.data.Dataset.list_files result in segfault #44878

gnthibault commented Nov 14, 2020 •

edited

google-ml-butler bot commented Nov 14, 2020

gnthibault commented Nov 14, 2020

tf.data.Dataset.list_files result in segfault #44878

tf.data.Dataset.list_files result in segfault #44878

Comments

gnthibault commented Nov 14, 2020 • edited

google-ml-butler bot commented Nov 14, 2020

gnthibault commented Nov 14, 2020

gnthibault commented Nov 14, 2020 •

edited