New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
1.15.2: _pywrap_tensorflow_internal.so differs between builds #37997
Comments
|
extra info: our openSUSE tensorflow2-2.1.0 package is also not affected |
|
tensorflow-1.15.3 still has these variations in |
|
looking at the diff of build trees again, I found this diff +++ org_tensorflow/bazel-out/k8-opt/genfiles/tensorflow/compiler/mlir/lite/transforms/generated_prepare_tf.inc
@@ -289,7 +289,7 @@
*/
struct GeneratedConvert0 : public RewritePattern {
GeneratedConvert0(MLIRContext *context)
- : RewritePattern("tf.FusedBatchNorm", {"tf.Add", "tf.Mul", "tf.Rsqrt", "tf.Sub", "tf.Const"}, 1, context) {}
+ : RewritePattern("tf.FusedBatchNorm", {"tf.Const", "tf.Rsqrt", "tf.Add", "tf.Mul", "tf.Sub"}, 1, context) {}generated by a bazel call of from |
|
Can you check against master please? We won't be able to fix this on patch releases since it seems to be a significant amount of change. |
|
There seems to be a large difference from v1.15.3..master and master branch lacks the |
|
It is expected that there are differences between master and |
|
I built that llvm master mlir-tblgen and when omitting the I tried to compare it to the mlir-tblgen from 1.15.3 but that coredumped on the same call. so it seems I know too little to properly debug that. |
|
Hi There, We are checking to see if you still need help on this, as you are using an older version of tensorflow which is officially considered end of life . We recommend that you upgrade to the latest 2.x version and let us know if the issue still persists in newer versions. Please open a new issue for any help you need against 2.x, and we will get you the right help. This issue will be closed automatically 7 days from now. If you still need help with this issue, please provide us with more information. |
System information
Describe the problem
While working on reproducible builds for openSUSE, I found that
our tensorflow-1.15.2 package varied across builds.
See https://reproducible-builds.org/ for why this matters.
The variations do not occur when disabling ASLR for the build.
The previous 1.13.2 version built with python-3.7 still did build reproducibly.
Provide the exact sequence of commands / steps that you executed before running into the problem
build tensorflow twice from scratch:
osc checkout openSUSE:Factory/tensorflow && cd $_
osc build --noservice --keep-pkg=RPMS
and compare resulting _pywrap_tensorflow_internal.so content
Any other info / logs
/usr/lib64/python3.8/site-packages/tensorflow_core/python/_pywrap_tensorflow_internal.so differs in assembler outputby building without Link Time Optimization (LTO), I could see that exactly one .o file differed in the build environment
Binary files /var/tmp/build-root.10/.mount/home/abuild/rpmbuild/SOURCES/BAZEL/_bazel_abuild/089fd2236bcbfcbcf994cdf39cd6bcb6/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/compiler/mlir/lite/_objs/tensorflow_lite_legalize_tf/prepare_tf.pic.o and /var/tmp/build-root.10b/.mount/home/abuild/rpmbuild/SOURCES/BAZEL/_bazel_abuild/089fd2236bcbfcbcf994cdf39cd6bcb6/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/compiler/mlir/lite/_objs/tensorflow_lite_legalize_tf/prepare_tf.pic.o differ
also the asm diff contained
that comes from
tensorflow-1.15.2/tensorflow/compiler/mlir/lite/transforms/prepare_composite_functions_tf.cc
which is very close to the prepare_tf.cc file used to create the differing .o file
It is possible that the nondeterminism comes from within gcc9 and gcc10 triggered by some special feature used in prepare_tf.cc but to prove that, I would need a preprocessed version of that compilation. Due to the size and complexity of the build process, I did not manage to get that yet.
The text was updated successfully, but these errors were encountered: