Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tfdv manylinux pypi packages are built/linked on too new of a platform for general compatibility #76

Closed
kwlzn opened this issue Jul 24, 2019 · 8 comments

Comments

@kwlzn
Copy link

kwlzn commented Jul 24, 2019

when we attempt to use the current manylinux bdist from pypi (tensorflow_data_validation-0.13.1-cp36-cp36m-manylinux1_x86_64.whl) on a centos7 machine, we see the following ImportError:

---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
<ipython-input-2-1ad020593972> in <module>()
1 import pkg_resources, importlib
2 importlib.reload(pkg_resources)
----> 3 import tensorflow_data_validation as tfdv
 
/var/lib/mesos/slaves/8bfbe6e2-3bf7-4b49-90c3-15e4be759186-S218/frameworks/201104070004-0000002563-0000/executors/thermos-kwilson-devel-pycx-notebook-0-e9d46056-d500-4c44-9970-ebab4e39f006/runs/18847da2-8502-4393-a01d-c5bfc264f405/sandbox/.pex/install/tensorflow_data_validation-0.13.1-cp36-cp36m-manylinux1_x86_64.whl.840798a46d57eb5c1ed0f639d3f47149480121e9/tensorflow_data_validation-0.13.1-cp36-cp36m-manylinux1_x86_64.whl/tensorflow_data_validation/__init__.py in <module>()
19
20 # Import validation API.
---> 21 from tensorflow_data_validation.api.validation_api import infer_schema
22 from tensorflow_data_validation.api.validation_api import validate_instance
23 from tensorflow_data_validation.api.validation_api import validate_statistics
 
/var/lib/mesos/slaves/8bfbe6e2-3bf7-4b49-90c3-15e4be759186-S218/frameworks/201104070004-0000002563-0000/executors/thermos-kwilson-devel-pycx-notebook-0-e9d46056-d500-4c44-9970-ebab4e39f006/runs/18847da2-8502-4393-a01d-c5bfc264f405/sandbox/.pex/install/tensorflow_data_validation-0.13.1-cp36-cp36m-manylinux1_x86_64.whl.840798a46d57eb5c1ed0f639d3f47149480121e9/tensorflow_data_validation-0.13.1-cp36-cp36m-manylinux1_x86_64.whl/tensorflow_data_validation/api/validation_api.py in <module>()
26 import tensorflow as tf
27 from tensorflow_data_validation import types
---> 28 from tensorflow_data_validation.pywrap import pywrap_tensorflow_data_validation
29 from tensorflow_data_validation.statistics import stats_impl
30 from tensorflow_data_validation.statistics import stats_options
 
/var/lib/mesos/slaves/8bfbe6e2-3bf7-4b49-90c3-15e4be759186-S218/frameworks/201104070004-0000002563-0000/executors/thermos-kwilson-devel-pycx-notebook-0-e9d46056-d500-4c44-9970-ebab4e39f006/runs/18847da2-8502-4393-a01d-c5bfc264f405/sandbox/.pex/install/tensorflow_data_validation-0.13.1-cp36-cp36m-manylinux1_x86_64.whl.840798a46d57eb5c1ed0f639d3f47149480121e9/tensorflow_data_validation-0.13.1-cp36-cp36m-manylinux1_x86_64.whl/tensorflow_data_validation/pywrap/pywrap_tensorflow_data_validation.py in <module>()
26                 fp.close()
27             return _mod
---> 28     _pywrap_tensorflow_data_validation = swig_import_helper()
29     del swig_import_helper
30 else:
 
/var/lib/mesos/slaves/8bfbe6e2-3bf7-4b49-90c3-15e4be759186-S218/frameworks/201104070004-0000002563-0000/executors/thermos-kwilson-devel-pycx-notebook-0-e9d46056-d500-4c44-9970-ebab4e39f006/runs/18847da2-8502-4393-a01d-c5bfc264f405/sandbox/.pex/install/tensorflow_data_validation-0.13.1-cp36-cp36m-manylinux1_x86_64.whl.840798a46d57eb5c1ed0f639d3f47149480121e9/tensorflow_data_validation-0.13.1-cp36-cp36m-manylinux1_x86_64.whl/tensorflow_data_validation/pywrap/pywrap_tensorflow_data_validation.py in swig_import_helper()
22         if fp is not None:
23             try:
---> 24                 _mod = imp.load_module('_pywrap_tensorflow_data_validation', fp, pathname, description)
25             finally:
26                 fp.close()
 
/opt/ee/python/3.6/lib/python3.6/imp.py in load_module(name, file, filename, details)
241                 return load_dynamic(name, filename, opened_file)
242         else:
--> 243             return load_dynamic(name, filename, file)
244     elif type_ == PKG_DIRECTORY:
245         return load_package(name, filename)
 
/opt/ee/python/3.6/lib/python3.6/imp.py in load_dynamic(name, path, file)
341         spec = importlib.machinery.ModuleSpec(
342             name=name, loader=loader, origin=path)
--> 343         return _load(spec)
344
345 else:
 
ImportError: /lib64/libstdc++.so.6: version `GLIBCXX_3.4.20' not found (required by /var/lib/mesos/slaves/8bfbe6e2-3bf7-4b49-90c3-15e4be759186-S218/frameworks/201104070004-0000002563-0000/executors/thermos-kwilson-devel-pycx-notebook-0-e9d46056-d500-4c44-9970-ebab4e39f006/runs/18847da2-8502-4393-a01d-c5bfc264f405/sandbox/.pex/install/tensorflow_data_validation-0.13.1-cp36-cp36m-manylinux1_x86_64.whl.840798a46d57eb5c1ed0f639d3f47149480121e9/tensorflow_data_validation-0.13.1-cp36-cp36m-manylinux1_x86_64.whl/tensorflow_data_validation/pywrap/_pywrap_tensorflow_data_validation.so)

ldd reveals a linking issue on the inner .so:

$ ldd .pex/install/tensorflow_data_validation-0.13.1-cp36-cp36m-manylinux1_x86_64.whl.840798a46d57eb5c1ed0f639d3f47149480121e9/tensorflow_data_validation-0.13.1-cp36-cp36m-manylinux1_x86_64.whl/tensorflow_data_validation/pywrap/_pywrap_tensorflow_data_validation.so
.pex/install/tensorflow_data_validation-0.13.1-cp36-cp36m-manylinux1_x86_64.whl.840798a46d57eb5c1ed0f639d3f47149480121e9/tensorflow_data_validation-0.13.1-cp36-cp36m-manylinux1_x86_64.whl/tensorflow_data_validation/pywrap/_pywrap_tensorflow_data_validation.so: /lib64/libstdc++.so.6: version `GLIBCXX_3.4.20' not found (required by .pex/install/tensorflow_data_validation-0.13.1-cp36-cp36m-manylinux1_x86_64.whl.840798a46d57eb5c1ed0f639d3f47149480121e9/tensorflow_data_validation-0.13.1-cp36-cp36m-manylinux1_x86_64.whl/tensorflow_data_validation/pywrap/_pywrap_tensorflow_data_validation.so)
.pex/install/tensorflow_data_validation-0.13.1-cp36-cp36m-manylinux1_x86_64.whl.840798a46d57eb5c1ed0f639d3f47149480121e9/tensorflow_data_validation-0.13.1-cp36-cp36m-manylinux1_x86_64.whl/tensorflow_data_validation/pywrap/_pywrap_tensorflow_data_validation.so: /lib64/libstdc++.so.6: version `CXXABI_1.3.8' not found (required by .pex/install/tensorflow_data_validation-0.13.1-cp36-cp36m-manylinux1_x86_64.whl.840798a46d57eb5c1ed0f639d3f47149480121e9/tensorflow_data_validation-0.13.1-cp36-cp36m-manylinux1_x86_64.whl/tensorflow_data_validation/pywrap/_pywrap_tensorflow_data_validation.so)
        linux-vdso.so.1 =>  (0x00007ffff1ebc000)
        libdl.so.2 => /lib64/libdl.so.2 (0x00007f06a4570000)
        libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f06a4354000)
        libm.so.6 => /lib64/libm.so.6 (0x00007f06a4052000)
        libstdc++.so.6 => /lib64/libstdc++.so.6 (0x00007f06a3d4b000)
        libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007f06a3b35000)
        libc.so.6 => /lib64/libc.so.6 (0x00007f06a3768000)
        /lib64/ld-linux-x86-64.so.2 (0x00007f06a4e42000)

which points to being built/linked on a newer system than is compatible with this configuration:

sh-4.2$ cat /etc/redhat-release
CentOS release 7.6.1810 (Core)
sh-4.2$ rpm -q glibc libstdc++
glibc-2.17-260.el7_6.3.x86_64
libstdc++-4.8.5-36.el7.x86_64

thus I'm fairly sure these binaries aren't actually manylinux (or even broadly centos7) compatible.

@paulgc
Copy link
Member

paulgc commented Jul 24, 2019

@kwlzn The current wheels aren't manylinux compatible. We are working towards releasing manylinux2010 compatible wheels. Have you tried building from source using the instructions here? Let us know if you need any help with this.

@brills

@kwlzn
Copy link
Author

kwlzn commented Jul 24, 2019

@paulgc shouldn't they drop the manylinux1_x86_64 platform designation in the wheel names on pypi then to conform to the manylinux contract?

while we can build from source, it's a bit of a hassle (esp given the non-conforming build process - i.e. no simple python setup.py bdist_wheel interface and ambiguous multi-interpreter build steps, bazel deps, etc). having pre-built manylinux1_x86_64 wheels published to pypi by some tensorflow-sanctioned CI process would be far more convenient (and consistent/safe) for general consumption IMHO.

@kwlzn
Copy link
Author

kwlzn commented Jul 24, 2019

@paulgc actually, I am having build issues trying to build the last tagged release.. master works fine, but on tag v0.13.1 I see:

root[7]7059013cbaa7(wilson.imaging) data-validation # git rev-parse master
2758408a2491a0273698e0ffc1b5746b6f31a958
root[7]7059013cbaa7(wilson.imaging) data-validation # git checkout v0.13.1
Note: checking out 'v0.13.1'.

You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by performing another checkout.

If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -b with the checkout command again. Example:

  git checkout -b new_branch_name

HEAD is now at f51dc7a... Prepare for TFDV 0.13.1 release
root[7]7059013cbaa7(wilson.imaging) data-validation # bazel run -c opt tensorflow_data_validation:build_pip_package
ERROR: /root/.cache/bazel/_bazel_root/28fd227f212b604167c23600bb412aac/external/org_tensorflow/third_party/gpus/cuda_configure.bzl:115:1: load() statements must be called before any other statement. First non-load() statement appears at /root/.cache/bazel/_bazel_root/28fd227f212b604167c23600bb412aac/external/org_tensorflow/third_party/gpus/cuda_configure.bzl:26:1. Use --incompatible_bzl_disallow_load_after_statement=false to temporarily disable this check.
ERROR: error loading package '': in /workdir/data-validation/tensorflow_data_validation/workspace.bzl: in /root/.cache/bazel/_bazel_root/28fd227f212b604167c23600bb412aac/external/org_tensorflow/tensorflow/workspace.bzl: Extension 'third_party/gpus/cuda_configure.bzl' has errors
ERROR: error loading package '': in /workdir/data-validation/tensorflow_data_validation/workspace.bzl: in /root/.cache/bazel/_bazel_root/28fd227f212b604167c23600bb412aac/external/org_tensorflow/tensorflow/workspace.bzl: Extension 'third_party/gpus/cuda_configure.bzl' has errors
INFO: Elapsed time: 5.102s
INFO: 0 processes.
FAILED: Build did NOT complete successfully (0 packages loaded)
FAILED: Build did NOT complete successfully (0 packages loaded)
root[7]7059013cbaa7(wilson.imaging) data-validation # 

if I build from master, the produced bdist introduces this resolve error into my build because of the shifting dependency versions vs pypi released state for e.g. apache-beam:

pex.resolver.Unsatisfiable: Could not satisfy all requirements for pyarrow==0.14.0:
    pyarrow==0.14.0, pyarrow<0.15.0,>=0.14.0(from: tensorflow_data_validation==0.14.0.dev0+twtr0), pyarrow<0.14.0,>=0.11.1; python_version >= "3.0" or platform_system != "Windows"(from: apache-beam[gcp]==2.13.0)

@paulgc
Copy link
Member

paulgc commented Jul 25, 2019

@kwlzn PyPI currently only accepts wheels with manylinux tag in the wheel name. But doesn't actually verify if the wheel is manylinux compatible. Hence, we get away with this by naming the wheel with the manylinux tag. I see this only as a temporary state as we are working towards generating manylinux compatible wheels.

Reg. your error when building from master, the dependency mismatch happens as Apache Beam has pyarrow<0.14.0,>=0.11.1 dependency requirement for PyArrow, but TFDV requires pyarrow<0.15.0,>=0.14.0. This error should go away once Apache Beam 2.14 is released (few days away) where the PyArrow dependency has been updated.

Currently you can force install pyarrow>=0.14.0 and it should be fine. TFDV master depends on new features in PyArrow 0.14 and hence you need the latest PyArrow. Beam should be fine with the latest PyArrow.

@kwlzn
Copy link
Author

kwlzn commented Jul 25, 2019

@paulgc that sounds like an egregious violation of python packaging standards to me. our expectation when we see a manylinux wheel provided is that it's safely linked and consumable - full stop.

and I completely understand the latter error but we unfortunately cannot "force install" conflicting deps in our production environment. i.e. we actually rely on correct metadata in the packages we consume.

we'd also ideally love to run only tagged releases vs arbitrary cuts from master, but it sounds like the 0.13.1 release is so badly broken that our best bet would be to wait for tfdv 0.14.0 + apache_beam==2.14 to conduct our evaluation?

@paulgc
Copy link
Member

paulgc commented Jul 26, 2019

@kwlzn Agree that we are in a bad state currently and are violating the Python packaging standards. We have added the scripts to build TFDV in the manylinux2010 docker image (see documentation here). So 0.14 wheels should be manylinux compatible.

Unfortunately you would have to then wait for Beam 2.14 which should be available in a few days. Let me update the thread once it is released.

@paulgc
Copy link
Member

paulgc commented Aug 6, 2019

@kwlzn Resolving this as TFDV 0.14 is released (which depends on Beam 2.14) and the wheels are manylinux compatible.

Let us know if you face any other issues.

@paulgc paulgc closed this as completed Aug 6, 2019
@kwlzn
Copy link
Author

kwlzn commented Aug 7, 2019

thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants