Make tutorial_example_trainer build on Windows with Bazel #4796

Merged
merged 13 commits into from Oct 11, 2016

Projects

None yet

9 participants

@meteorcloudy
Member
meteorcloudy commented Oct 6, 2016 edited

With these changes, we are able to build the C++ example trainer by
bazel build --host_cpu=x64_windows_msvc --cpu=x64_windows_msvc -c opt tensorflow/cc:tutorials_example_trainer
with Bazel from HEAD on Windows.
I modified the BUILD files carefully, hopefully it won't break TF build on other platform.
@mrry @damienmg @dslomov

meteorcloudy added some commits Sep 21, 2016
@meteorcloudy meteorcloudy Make //tensorflow/core:core_cpu build on Windows
2e07e1b
@meteorcloudy meteorcloudy Fix platform.bzl for cuda configuration on Windows
I am not sure about the naming convention of the dlls, but I did see some dll name like
curand32_42_9.dll or cupti64_65.dll.
7225d72
@meteorcloudy meteorcloudy Fix //tensorflow/core:core_cpu again after sync to tf/master
1. Excluded load_library.cc
2. Fixed logging.h
32b5414
@meteorcloudy meteorcloudy Exclude common_runtime/gpu/ and stream_executor on Windows
57b1294
@meteorcloudy meteorcloudy Using #pragma to include necessary library
2acbb91
@meteorcloudy meteorcloudy Add windows specified sources
26b3eea
@meteorcloudy meteorcloudy Exclude some ops on Windows
Including debug_ops, immutable_constant_op, sparse_matmul_op, fact_op, cswise_op, depthwise_conv_op, svd_op, summary_audio_op

This makes //tensorflow/core:tensorflow build on Windows
3925e88
@meteorcloudy meteorcloudy Add /DLANG_CXX11 in tf_copts for Windows
481d623
@meteorcloudy meteorcloudy Fix example trainer build after sync to windows-20160915
3ccce56
@meteorcloudy meteorcloudy Re-enable svd_op, cwise_op, summary_audio_op, on Windows in Bazel Build
9b4b07e
@meteorcloudy meteorcloudy Fixed lib_proto_parsing by adding 'copts = tf_copts()'
fe74dfa
@tensorflow-jenkins
Collaborator

Can one of the admins verify this patch?

@mention-bot

@meteorcloudy, thanks for your PR! By analyzing the history of the files in this pull request, we identified @josh11b, @vrv and @jhseu to be potential reviewers.

@googlebot googlebot added the cla: yes label Oct 6, 2016
@mrry
Contributor
mrry commented Oct 6, 2016

Thanks for sending this in, Yun!

@mrry
Contributor
mrry commented Oct 6, 2016

@tensorflow-jenkins test this please.

@meteorcloudy
Member

You are welcome! Looks like there are many failures, I'll look into it.

tensorflow/core/BUILD
+ ] +
+ # Protobuf deps already included through the ":lib_proto_parsing"
+ # dependency.
+ tf_additional_proto_srcs(),
@mrry
mrry Oct 6, 2016 Contributor

Nit: The indentation is a bit inconsistent between L955 and here.

@meteorcloudy
meteorcloudy Oct 6, 2016 Member

Thanks, I'll reformat all the BUILD files with buildifier.

tensorflow/core/BUILD
"//third_party/eigen3",
- ],
+ ] + if_not_windows([":stream_executor"]),
@mrry
mrry Oct 6, 2016 Contributor

Instead of making this an (almost) no-op when building on Windows, could we modify :core so that it only pulls in :gpu_runtime on not-Windows?

There are a few targets that depend on :gpu_runtime in this file, but they all appear to depend on :core_cpu, so AFAICT this could be replaced with a dependency on :core.

@meteorcloudy
meteorcloudy Oct 6, 2016 Member

I found there are 4 targets we need to modify if we want to cut the dependency on gpu_runtime.
//tensorflow/core:core,
//tensorflow/core:tensorflow_opensource,
//tensorflow/core/util/ctc:ctc_beam_search_lib,
//tensorflow/core/kernels:array
As you can see, they are scattered in 3 different files, I kind of prefer the original change, so it will be easier when we enable gpu support later. WDYT?

@mrry
mrry Oct 7, 2016 edited Contributor

Some of these dependencies look spurious (especially for ctc_beam_search_lib), but fixing that is probably orthogonal to this change. ("gpu_runtime" seems like a strange thing for CPU-only code to depend on :)....) Fine to leave as is.

@@ -1608,7 +1657,9 @@ tf_kernel_libraries(
"//tensorflow/core:nn_grad",
"//tensorflow/core:nn_ops_op_lib",
"//third_party/eigen3",
- ],
+ ] + if_not_windows([
@mrry
mrry Oct 6, 2016 Contributor

I'm confused why this is treated differently from the tf_kernel_libraries() target, and thus not in :math_not_windows. (It may have been in the wrong place originally, but it'd be good to reduce the complexity if possible.)

I believe it should be possible to remove :depthwise_conv_op from the deps here, and move it into the prefixes of :math_not_windows. Does that make sense?

@meteorcloudy
meteorcloudy Oct 6, 2016 Member

Sounds good, I'll verify if it works

@meteorcloudy
meteorcloudy Oct 6, 2016 Member

Hmm.. As you can see the definition here
Looks like math_now_windows lacks dependencies depthwise_conv_op needs.
:conv_ops, :ops_util, //tensorflow/core:nn_ops_op_lib
Do you think it's fine to add them into deps of math_not_window?

@mrry
mrry Oct 7, 2016 Contributor

Sorry, I should have said that :depthwise_conv_ops should (probably, for consistency) be in a rule called :nn_not_windows. Does that make more sense?

@meteorcloudy
meteorcloudy Oct 7, 2016 Member

Yes, I agree!

+ })
+
+def tf_additional_lib_srcs(exclude = []):
+ return select({
@mrry
mrry Oct 6, 2016 Contributor

Nit: only one space between return and select

meteorcloudy added some commits Oct 6, 2016
@meteorcloudy meteorcloudy Reformat BUILD files
ebc40fb
@meteorcloudy meteorcloudy Refactor BUILD files and make it work on Linux
aae924c
@meteorcloudy
Member

Can we test it again?

@yifeif
Member
yifeif commented Oct 6, 2016

@tensorflow-jenkins test this please.

@yifeif
Member
yifeif commented Oct 7, 2016

@mrry could you take another look?

@meteorcloudy
Member

@mrry Just pushed, please review again. :)

@mrry
Contributor
mrry commented Oct 7, 2016

@tensorflow-jenkins test this please.

@mrry
mrry approved these changes Oct 7, 2016 View changes

Looks good to me - thanks for the cleanup!

@meteorcloudy
Member

From the error message in Linux GPU, looks like we should also pull depthwise_conv_grad_op out of deps, but depthwise_conv_grad_op needs depthwise_conv_op.h. What should we do?

@meteorcloudy
Member

Or maybe they are in the deps for a reason? See these two commits b51ef0c, 99e298a

@mrry
Contributor
mrry commented Oct 7, 2016

I'm not sure what the proper solution would be for them. Assigning to @jmchen-g, since he wrote those ops originally.

@jmchen-g jmchen-g was assigned by mrry Oct 7, 2016
@jmchen-g
Contributor

Which deps are not clear here? If you mean why it is in the deps of nn, Depthwise is one of the convs that we need to support in TF and is commonly used to reduce the computing needs for a model.

@meteorcloudy
Member

OK, I see. Sounds like depthwise_conv_ops is supposed to be in deps of nn. I've deleted my last commit. Please test again?

@meteorcloudy
Member
meteorcloudy commented Oct 11, 2016 edited

@mrry Are you OK with keeping depthwise_conv_ops in deps of nn? If so, I think this PR is ready to be merged.

@mrry
Contributor
mrry commented Oct 11, 2016

@meteorcloudy Yes, that's fine. I suspect our existing build rules are not as clean as they could be for this op, but that shouldn't derail what you're trying to do :).

I'll go ahead and merge now.

@mrry mrry removed the awaiting googler label Oct 11, 2016
@mrry mrry assigned mrry and unassigned jmchen-g Oct 11, 2016
@mrry mrry merged commit 1975cd1 into tensorflow:master Oct 11, 2016

8 checks passed

Android Demo App SUCCESS
Details
Linux CPU Tests SUCCESS
Details
Linux CPU Tests (Python 3) SUCCESS
Details
Linux GPU SUCCESS
Details
MacOS CPU Tests SUCCESS
Details
Sanity Checks SUCCESS
Details
ci.tensorflow.org SUCCESS
Details
cla/google All necessary CLAs are signed
@meteorcloudy
Member

Nice, thanks! \o/

@rohan100jain
Member

This change is causing problems in testing now with BUILD rules. Apparently select() and glob() don't play nicely with each other.

@meteorcloudy
Member

@rohan100jain Can you show me the error message? I wonder why it's not causing problem before.. Users are beginning to try Windows build with Bazel now, reverting this will completely break them.

@rohan100jain
Member

Even after fixing the bug at https://github.com/tensorflow/tensorflow/pull/4796/files#diff-15609f78c0cd35bfdc31f9d8bd9b1e91R984

there are failures like:

https://ci.tensorflow.org/job/tensorflow-cl-sanity/14995/console

I tried various combinations of glob() and select() and none seem to work.

@bssrdf
bssrdf commented Oct 16, 2016

Just want to follow up on this. I managed to build label_image example using the same procedure. Have to fix a bug at bssrdf@58f93bd#diff-f6254560279f78a9d74e66dfd2d85977 for it to build on win. But it failed to run due to some file path issue

$ bazel-bin/tensorflow/examples/label_image/label_image
E C:\tools\msys64\var\tmp\Bazel\s5nfxnhX\execroot\tensorflow\tensorflow\examples\label_image\main.cc:279] Not found: Failed to load compute graph at 'tensorflow/examples/label_image/data/tensorflow_inception_graph.pb'

although the file tensorflow_inception_graph.pb is present in that directory.

@meteorcloudy
Member

@rohan100jain Hmm.. This is weird.
Here's what I tested:
I reset my branch to c20da14, which is just before reverting this change.
And I ran bazel build --nobuild tensorflow/... on Mac, Linux and Windows. It all passed.

For glob() and select(), as long as you don't use select as arguments of glob, they will work just fine. So I'm not sure why it's failing like that on ci.

Can you give me some instructions to reproduce this error?

@meteorcloudy
Member
meteorcloudy commented Oct 17, 2016 edited

I believe there is nothing wrong with this PR, but I am not sure how to get it back. Another PR or can you revert the revert commit? @mrry @rohan100jain

@mrry
Contributor
mrry commented Oct 17, 2016

I think the best way forward is another PR. Should be quick to review if it's the same as the last one :).

@meteorcloudy
Member

OK, PR sent. #5009

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment