New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bazel compiliation is broken! build failure due to github checksums changing #12979

Closed
d4l3k opened this Issue Sep 11, 2017 · 17 comments

Comments

Projects
None yet
@d4l3k
Contributor

d4l3k commented Sep 11, 2017

System information

  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow): No
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Arch Linux
  • TensorFlow installed from (source or binary): Source
  • TensorFlow version (use command below): master
  • Python version: Python 3.6.2
  • Bazel version (if compiling from source): 0.5.4
  • CUDA/cuDNN version: N/A
  • GPU model and memory: N/A
  • Exact command to reproduce: bazel build --verbose_failures //tensorflow/contrib/android:libtensorflow_inference.so --crosstool_top=//external:android/crosstool --host_crosstool_top=@bazel_tools//tools/cpp:toolchain --cpu=armeabi-v7a

Describe the problem

GitHub tarball checksums have changed making it impossible to build tensorflow since the checksums don't match any more.

bazelbuild/bazel#3722

Source code / logs

ERROR: /home/travis/tensorflow/tensorflow/contrib/android/BUILD:72:1: error loading package 'tensorflow/core': Encountered error while reading extension file 'protobuf.bzl': no such package '@protobuf//': java.io.IOException: Error downloading [https://github.com/google/protobuf/archive/0b059a3d8a8f8aa40dde7bea55edca4ec5dfea66.tar.gz, http://mirror.bazel.build/github.com/google/protobuf/archive/0b059a3d8a8f8aa40dde7bea55edca4ec5dfea66.tar.gz] to /home/travis/.cache/bazel/_bazel_travis/c397b760afc31b444fffb10b0086dea5/external/protobuf/0b059a3d8a8f8aa40dde7bea55edca4ec5dfea66.tar.gz: Checksum was e5fdeee6b28cf6c38d61243adff06628baa434a22b5ebb7432d2a7fbabbdb13d but wanted 6d43b9d223ce09e5d4ce8b0060cb8a7513577a35a64c7e3dad10f0703bf3ad93 and referenced by '//tensorflow/contrib/android:libtensorflow_inference.so'
 /tmp/foo  curl -L https://github.com/google/protobuf/archive/0b059a3d8a8f8aa40dde7bea55edca4ec5dfea66.tar.gz | sha256sum
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   157    0   157    0     0    157      0 --:--:-- --:--:-- --:--:--   301
100 4274k  100 4274k    0     0  4274k      0  0:00:01  0:00:01 --:--:-- 8710k
e5fdeee6b28cf6c38d61243adff06628baa434a22b5ebb7432d2a7fbabbdb13d  -
 /tmp/foo  curl http://mirror.bazel.build/github.com/google/protobuf/archive/0b059a3d8a8f8aa40dde7bea55edca4ec5dfea66.tar.gz | sha256sum
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 4274k  100 4274k    0     0  4274k      0  0:00:01 --:--:--  0:00:01 6177k
6d43b9d223ce09e5d4ce8b0060cb8a7513577a35a64c7e3dad10f0703bf3ad93  -

@d4l3k d4l3k changed the title from build failure due to github checksums changing to bazel compiliation is broken! build failure due to github checksums changing Sep 11, 2017

@andrewharp

This comment has been minimized.

Member

andrewharp commented Sep 11, 2017

As a short-term workaround, you can force the sha256sum to match by removing the github.com entry.

@d4l3k

This comment has been minimized.

Contributor

d4l3k commented Sep 11, 2017

Temporary fix:

sed -i '\@https://github.com/google/protobuf/archive/0b059a3d8a8f8aa40dde7bea55edca4ec5dfea66.tar.gz@d' tensorflow/workspace.bzl
@lissyx

This comment has been minimized.

Contributor

lissyx commented Sep 12, 2017

Would anyone happen to know if it's an expected behavior from Github? What is going to happen for the tarball hosted on the mirror.bazel.build, are they going to be updated ? Should we trust that new sha256 ? I have spotted that the URL I use for RPi3 toolchain download is impacted as well:

ERROR: /home/build-user/DeepSpeech/tf/tools/arm_compiler/BUILD:116:1: no such package '@GccArmRpi//': Error downloading [https://github.com/raspberrypi/tools/archive/0e906ebc527eab1cdbf7adabff5b474da9562e9f.tar.gz] to /home/build-user/.cache/bazel/_bazel_build-user/c049635af10109d54fe54c6ebd9031b2/external/GccArmRpi/0e906ebc527eab1cdbf7adabff5b474da9562e9f.tar.gz: Checksum was 4c622a5c7b9feb9615d4723b03a13142a7f3f813f9296861d5401282b9fbea96 but wanted 970285762565c7890c6c087d262b0a18286e7d0384f13a37786d8521773bc969 and referenced by '//tools/arm_compiler:gcc_linux_linker_files'.
@bzamecnik

This comment has been minimized.

bzamecnik commented Sep 12, 2017

As a workaround just comment the sha256 checksum lines in tensorflow/workspace.bzl. The checksums changed probably due to some library change on GitHub side.

@lissyx

This comment has been minimized.

Contributor

lissyx commented Sep 13, 2017

Answer to myself: libgit2/libgit2#4343 (comment)
TL;DR it does confirm that Github changes the code to produce tarball, and that the way they are used is fundamentaly risky.

@svenstaro

This comment has been minimized.

svenstaro commented Sep 16, 2017

There should probably be a new tensorflow release with those fixes since currently no one can build the latest release.

@tlc

This comment has been minimized.

tlc commented Sep 18, 2017

I've hit this, too. Unfortunately I removed my bazel cache as I was trying to figure it out.
Now it seems to fail early on 'gemmlowp'.

Should I just comment out all sha256 lines? Is there a better workaround?

$ bazel build ${BAZEL_OPTS} tensorflow_serving/...
WARNING: ignoring http_proxy in environment.
WARNING: /home/troy/.cache/bazel/_bazel_troy/d52b2ff19a6bd234d2c10cb6bf93de82/external/org_tensorflow/third_party/py/python_configure.bzl:30:3: Python Configuration Warning: 'PYTHON_LIB_PATH' environment variable is not set, using '/usr/local/lib/python2.7/dist-packages' as default.
ERROR: /home/troy/.cache/bazel/_bazel_troy/d52b2ff19a6bd234d2c10cb6bf93de82/external/org_tensorflow/tensorflow/core/kernels/neon/BUILD:27:1: no such package '@gemmlowp//': Error downloading [http://mirror.bazel.build/github.com/google/gemmlowp/archive/010bb3e71a26ca1d0884a167081d092b43563996.tar.gz, https://github.com/google/gemmlowp/archive/010bb3e71a26ca1d0884a167081d092b43563996.tar.gz] to /home/troy/.cache/bazel/_bazel_troy/d52b2ff19a6bd234d2c10cb6bf93de82/external/gemmlowp/010bb3e71a26ca1d0884a167081d092b43563996.tar.gz: Checksum was 861cc6d9d902861f54fd77e1ab79286477dcc559b2a283e75b9c22d37b61f6ae but wanted 0d7a44327e26b622ee08faaea10f8d10b439bcfda622f9c98be1c036bc645cad and referenced by '@org_tensorflow//tensorflow/core/kernels/neon:neon_depthwise_conv_op'.
ERROR: Analysis of target '//tensorflow_serving/sources/storage_path:file_system_storage_path_source' failed; build aborted.
INFO: Elapsed time: 9.370s

@thomasjo

This comment has been minimized.

thomasjo commented Sep 18, 2017

@tlc I added the following (temporary) line to one of my Dockerfiles. Note that this completely disables checksum validation, so this is probably a really stupid idea — you have been warned...

sed -ri "/^\W+sha256 = \"[^\"]+\"\W+$/d" tensorflow/workspace.bzl
@tlc

This comment has been minimized.

tlc commented Sep 18, 2017

The # of things to comment out is currently small.

Building TensorFlow Serving, I only had to comment out 'gemmlowp'.

Building tensorflow/tools/pip_package:build_pip_package, I only had to comment out 'boringssl'.

@aselle

This comment has been minimized.

Member

aselle commented Sep 21, 2017

@gunan

This comment has been minimized.

Member

gunan commented Sep 21, 2017

Ack.
Looks like we have to mirror all github URLs ourselves and not link github. Apparently git has no guarantees for checksums of the archives they provide.

@tycho

This comment has been minimized.

tycho commented Sep 21, 2017

There's no technical reason why they couldn't provide archives with consistent checksums. The inputs used to create the tarballs are constant for any given commit/tag hash (same timestamps, other metadata and content), so it should be feasible to create archives with consistent hashes. Have we investigated what the actual differences are? Is the compression somehow introducing the variability? Perhaps hashing the uncompressed .tar files would be more reliable?

@gunan

This comment has been minimized.

Member

gunan commented Sep 21, 2017

libgit2/libgit2#4343 (comment) is the explanation.

We on TF end are bound by the feature provided to us by bazel to check the archive checksum after downloading it. We can ask a new feature from bazel to check the source tree checksum, but it will take until the next bazel release to land, and we need to fix things before then. So the quickest solution seems to be the self-mirroring solution.

@tycho

This comment has been minimized.

tycho commented Sep 21, 2017

I didn't mean source tree checksums, although that'd be reasonable too I guess. I meant just cat foo.tar.gz | gzip -d | sha256sum -

codrut3 added a commit to codrut3/tensorflow that referenced this issue Oct 1, 2017

esmanchik added a commit to esmanchik/tensorflow that referenced this issue Oct 9, 2017

achimnol added a commit to lablup/backend.ai-kernels that referenced this issue Nov 1, 2017

rlencou added a commit to LogiVideo/tensorflow that referenced this issue Dec 13, 2017

Update workspace.bzl
tensorflow#12979

try removing github mirror rather than changing the sah256.

it seems the tarball has wrong sha
@ziyuang

This comment has been minimized.

ziyuang commented Jan 3, 2018

Is my case related?

$ bazel build --config=opt --config=cuda --config=mkl //tensorflow/tools/pip_package:build_pip_package                                                                        
ERROR: /home/linzi/Downloads/tensorflow-1.4.1/tensorflow/tools/pip_package/BUILD:139:1: error loading package 'tensorflow': Encountered error while reading extension file 'protobuf.bzl': no such package '@protobuf_archive//': java.io.IOException: Error downloading [http://mirror.bazel.build/github.com/google/protobuf/archive/b04e5cba356212e4e8c66c61bbe0c3a20537c5b9.tar.gz] to /home/linzi/.cache/bazel/_bazel_linzi/68376711a6e4ce84b78bea12ff84978f/external/protobuf_archive/b04e5cba356212e4e8c66c61bbe0c3a20537c5b9.tar.gz: sun.security.validator.ValidatorException: End user tried to act as a CA and referenced by '//tensorflow/tools/pip_package:build_pip_package'
ERROR: /home/linzi/Downloads/tensorflow-1.4.1/tensorflow/tools/pip_package/BUILD:139:1: error loading package 'tensorflow': Encountered error while reading extension file 'protobuf.bzl': no such package '@protobuf_archive//': java.io.IOException: Error downloading [http://mirror.bazel.build/github.com/google/protobuf/archive/b04e5cba356212e4e8c66c61bbe0c3a20537c5b9.tar.gz] to /home/linzi/.cache/bazel/_bazel_linzi/68376711a6e4ce84b78bea12ff84978f/external/protobuf_archive/b04e5cba356212e4e8c66c61bbe0c3a20537c5b9.tar.gz: sun.security.validator.ValidatorException: End user tried to act as a CA and referenced by '//tensorflow/tools/pip_package:build_pip_package'
ERROR: Analysis of target '//tensorflow/tools/pip_package:build_pip_package' failed; build aborted: error loading package 'tensorflow': Encountered error while reading extension file 'protobuf.bzl': no such package '@protobuf_archive//': java.io.IOException: Error downloading [http://mirror.bazel.build/github.com/google/protobuf/archive/b04e5cba356212e4e8c66c61bbe0c3a20537c5b9.tar.gz] to /home/linzi/.cache/bazel/_bazel_linzi/68376711a6e4ce84b78bea12ff84978f/external/protobuf_archive/b04e5cba356212e4e8c66c61bbe0c3a20537c5b9.tar.gz: sun.security.validator.ValidatorException: End user tried to act as a CA
INFO: Elapsed time: 13.775s
FAILED: Build did NOT complete successfully (5 packages loaded)
    currently loading: tensorflow
@gunan

This comment has been minimized.

Member

gunan commented Jan 3, 2018

No, in your case you have an issue with your certificates on your system.

@ziyuang

This comment has been minimized.

ziyuang commented Jan 3, 2018

@gunan Using the master HEAD instead of 1.4.1 resolves the issue but I don't know why.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment