Trouble running the pytorch example
Segmentation fault using pytorch when reading dataset
A workaround: always
import pyarrow before
torch/_C.so is loaded
RTLD_GLOBAL flag. As a result, dynamic linker places all the symbols exported by
into the global scope. When pyarrow shared libraries are loaded they will be resolved using
_C.so exports some of the standard c++ library symbols. A crash may occur if the versions of the standard C++ libraries
Loading in reverse order is fine since pyarrow libraries are not exposing their symbols in the linkers' global namespace.
sudo apt-get install libtcmalloc-minimal4 LD_PRELOAD="/usr/lib/libtcmalloc_minimal.so.4" python examples/mnist/pytorch_example.py
Import error due to
dlopen failing to load more object into static thread-local storage (TLS)
If you see the following error while trying to run the pytorch example, you are in luck:
File "/usr/local/lib/python2.7/dist-packages/torch/__init__.py", line 80, in <module> from torch._C import * ImportError: dlopen: cannot load any more object with static TLS
This problem stems from a known defect in glibc
dlopen logic that made conservative
assumptions about static thread-local storage, specific with respect to surplus
DTV slots, which no longer suffice for modern compute needs.
Solutions, in increasing order of effort involved:
import torchas early as possible (tensorflow models issue 523)
- Build pytorch from source (see below, also pytorch issue 643)
- Patch glibc to increase surplus DTV slots from 14 to 32 or 64.
For background, this issue was reported back in 2013 with Matlab since 2012.
For additional references, find the glibc bug report and fix,
and the accompanying Debian glibc bug report 793689. According to 793641,
some variabnt of the static TLS fix was included in
The OpenMP library
libgomp.so.1 has had this fix in place since circa 2015.
Ubuntu Xenial and above also contains this fix, but just updating operating system
may not be sufficient if
torch links against its own version of glibc that
still uses static TLS. An
ldd analysis (per 793689 comment 20)
can reveal whether libraries like torch is actually still using static TLS.
Building pytorch from Dockerfile and using it
If you choose to build pytorch from source, you can do so using the pytorch Dockerfile as follows:
- Clone the pytorch repo and
docker build -t pytorch -f docker/pytorch/Dockerfile --build-arg PYTHON_VERSION=2.7.6 .* Set
PYTHON_VERSIONto your version of choice, or leave out for pytorch Dockerfile default
- Build the custom pytorch docker:
docker build -t petastorm_torch -f examples/mnist/pytorch/Dockerfile .
- Run the container and work with your code:
docker run -it --rm petastorm_torch:latest /bin/bash