New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix MPI build issue with bazel #20147
Conversation
Assigning to @jlebar since the code was recently migrated to use new namespace. |
I won't have my laptop until Monday, so apologies for the somewhat vague comment, but rather than depend on XLA to get StatusOr, I'd recommend depending on the relevant header and BUILD dep within StreamExecutor. That's less of a layering violation. |
While trying to build tensorflow with MPI through: ``` ... $ bazel build -s --verbose_failures --config=opt //tensorflow/tools/pip_package:build_pip_package ``` , the following failure happens: ``` tensorflow/contrib/mpi_collectives/kernels/mpi_ops.cc:76:18: error: 'se' does not name a type using StatusOr = se::port::StatusOr<T>; ^ tensorflow/contrib/mpi_collectives/kernels/mpi_ops.cc:139:28: error: 'StatusOr' was not declared in this scope typedef std::function<void(StatusOr<Tensor>)> CommunicationDoneCallback; ... ... ``` This fix addresses the build failures for MPI build. Signed-off-by: Yong Tang <yong.tang.github@outlook.com>
Signed-off-by: Yong Tang <yong.tang.github@outlook.com>
Signed-off-by: Yong Tang <yong.tang.github@outlook.com>
2131c81
to
83aa3f0
Compare
@jlebar Thanks for the review. I updated the PR and used |
deps = [ | ||
":stream_executor", | ||
], | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What problem is this new build rule solving? I thought this was the purpose of //tensorflow/core:stream_executor_headers_lib.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jlebar Without the headers_lib, the following error will show up:
ERROR: /home/ubuntu/tensorflow/tensorflow/contrib/mpi_collectives/BUILD:40:1: undeclared inclusion(s) in rule '//tensorflow/contrib/mpi_collectives:python/ops/_mpi_ops.so':
this rule is missing dependency declarations for the following files included by 'tensorflow/contrib/mpi_collectives/kernels/mpi_ops.cc':
'tensorflow/stream_executor/lib/statusor.h'
'tensorflow/stream_executor/lib/status.h'
'tensorflow/stream_executor/lib/error.h'
'tensorflow/stream_executor/lib/stringpiece.h'
'tensorflow/stream_executor/platform/port.h'
'tensorflow/stream_executor/platform/logging.h'
'tensorflow/stream_executor/lib/statusor_internals.h'
Target //tensorflow/tools/pip_package:build_pip_package failed to build
So I added the stream_executor_headers_lib
to address the above undeclared inclusion(s)
issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without the headers_lib, the following error will show up:
That explains why you need to add a new dependency to tensorflow/contrib/mpi_collectives/BUILD
.
My question is, can that dependency be //tensorflow/core:stream_executor_headers_lib
instead of this new rule? If not, why not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jlebar With //tensorflow/core:stream_executor_headers_lib
the following error will show up:
ERROR: /home/ubuntu/tensorflow/tensorflow/contrib/mpi_collectives/BUILD:40:1: Label '//tensorflow/core:stream_executor_headers_lib' is duplicated in the 'deps' attribute of rule 'python/ops/_mpi_ops_gpu'
ERROR: /home/ubuntu/tensorflow/tensorflow/contrib/mpi_collectives/BUILD:40:1: Label '//tensorflow/core:stream_executor_headers_lib' is duplicated in the 'deps' attribute of rule 'python/ops/_mpi_ops.so_check_deps'
ERROR: /home/ubuntu/tensorflow/tensorflow/contrib/mpi_collectives/BUILD:40:1: Label '//tensorflow/core:stream_executor_headers_lib' is duplicated in the 'deps' attribute of rule 'python/ops/_mpi_ops.so'
I am not sure about the error, though from _mpi_ops_gpu
I guess stream_executor
may have gpu and no-gpu version and that might be the conflict?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, indeed, tensorflow/core:stream_executor_headers_lib depends on CUDA stuff. So the new build rule you added is different in this important way.
These build rules are extremely messed up.
I'm happy to take this for now, with the caveat that if and when we clean up the SE build rules, we may break you again. I don't like that, but apparently we have no continuous builds for MPI. :-/
@qlzh727, can you help get this landed? |
While trying to build tensorflow with MPI through:
, the following failure happens:
This fix addresses the build failures for MPI build.
Signed-off-by: Yong Tang yong.tang.github@outlook.com