New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make tutorial_example_trainer build on Windows with Bazel #4796

Merged
merged 13 commits into from Oct 11, 2016

Conversation

Projects
None yet
9 participants
@meteorcloudy
Member

meteorcloudy commented Oct 6, 2016

With these changes, we are able to build the C++ example trainer by
bazel build --host_cpu=x64_windows_msvc --cpu=x64_windows_msvc -c opt tensorflow/cc:tutorials_example_trainer
with Bazel from HEAD on Windows.
I modified the BUILD files carefully, hopefully it won't break TF build on other platform.
@mrry @damienmg @dslomov

@tensorflow-jenkins

This comment has been minimized.

Show comment
Hide comment
@tensorflow-jenkins

tensorflow-jenkins Oct 6, 2016

Collaborator

Can one of the admins verify this patch?

Collaborator

tensorflow-jenkins commented Oct 6, 2016

Can one of the admins verify this patch?

@mention-bot

This comment has been minimized.

Show comment
Hide comment
@mention-bot

mention-bot Oct 6, 2016

@meteorcloudy, thanks for your PR! By analyzing the history of the files in this pull request, we identified @josh11b, @vrv and @jhseu to be potential reviewers.

@meteorcloudy, thanks for your PR! By analyzing the history of the files in this pull request, we identified @josh11b, @vrv and @jhseu to be potential reviewers.

@googlebot googlebot added the cla: yes label Oct 6, 2016

@mrry

This comment has been minimized.

Show comment
Hide comment
@mrry

mrry Oct 6, 2016

Contributor

Thanks for sending this in, Yun!

Contributor

mrry commented Oct 6, 2016

Thanks for sending this in, Yun!

@mrry

This comment has been minimized.

Show comment
Hide comment
@mrry

mrry Oct 6, 2016

Contributor

@tensorflow-jenkins test this please.

Contributor

mrry commented Oct 6, 2016

@tensorflow-jenkins test this please.

@meteorcloudy

This comment has been minimized.

Show comment
Hide comment
@meteorcloudy

meteorcloudy Oct 6, 2016

Member

You are welcome! Looks like there are many failures, I'll look into it.

Member

meteorcloudy commented Oct 6, 2016

You are welcome! Looks like there are many failures, I'll look into it.

Show outdated Hide outdated tensorflow/core/BUILD
] +
# Protobuf deps already included through the ":lib_proto_parsing"
# dependency.
tf_additional_proto_srcs(),

This comment has been minimized.

@mrry

mrry Oct 6, 2016

Contributor

Nit: The indentation is a bit inconsistent between L955 and here.

@mrry

mrry Oct 6, 2016

Contributor

Nit: The indentation is a bit inconsistent between L955 and here.

This comment has been minimized.

@meteorcloudy

meteorcloudy Oct 6, 2016

Member

Thanks, I'll reformat all the BUILD files with buildifier.

@meteorcloudy

meteorcloudy Oct 6, 2016

Member

Thanks, I'll reformat all the BUILD files with buildifier.

Show outdated Hide outdated tensorflow/core/BUILD
"//third_party/eigen3",
],
] + if_not_windows([":stream_executor"]),

This comment has been minimized.

@mrry

mrry Oct 6, 2016

Contributor

Instead of making this an (almost) no-op when building on Windows, could we modify :core so that it only pulls in :gpu_runtime on not-Windows?

There are a few targets that depend on :gpu_runtime in this file, but they all appear to depend on :core_cpu, so AFAICT this could be replaced with a dependency on :core.

@mrry

mrry Oct 6, 2016

Contributor

Instead of making this an (almost) no-op when building on Windows, could we modify :core so that it only pulls in :gpu_runtime on not-Windows?

There are a few targets that depend on :gpu_runtime in this file, but they all appear to depend on :core_cpu, so AFAICT this could be replaced with a dependency on :core.

This comment has been minimized.

@meteorcloudy

meteorcloudy Oct 6, 2016

Member

I found there are 4 targets we need to modify if we want to cut the dependency on gpu_runtime.
//tensorflow/core:core,
//tensorflow/core:tensorflow_opensource,
//tensorflow/core/util/ctc:ctc_beam_search_lib,
//tensorflow/core/kernels:array
As you can see, they are scattered in 3 different files, I kind of prefer the original change, so it will be easier when we enable gpu support later. WDYT?

@meteorcloudy

meteorcloudy Oct 6, 2016

Member

I found there are 4 targets we need to modify if we want to cut the dependency on gpu_runtime.
//tensorflow/core:core,
//tensorflow/core:tensorflow_opensource,
//tensorflow/core/util/ctc:ctc_beam_search_lib,
//tensorflow/core/kernels:array
As you can see, they are scattered in 3 different files, I kind of prefer the original change, so it will be easier when we enable gpu support later. WDYT?

This comment has been minimized.

@mrry

mrry Oct 7, 2016

Contributor

Some of these dependencies look spurious (especially for ctc_beam_search_lib), but fixing that is probably orthogonal to this change. ("gpu_runtime" seems like a strange thing for CPU-only code to depend on :)....) Fine to leave as is.

@mrry

mrry Oct 7, 2016

Contributor

Some of these dependencies look spurious (especially for ctc_beam_search_lib), but fixing that is probably orthogonal to this change. ("gpu_runtime" seems like a strange thing for CPU-only code to depend on :)....) Fine to leave as is.

@@ -1608,7 +1657,9 @@ tf_kernel_libraries(
"//tensorflow/core:nn_grad",
"//tensorflow/core:nn_ops_op_lib",
"//third_party/eigen3",
],
] + if_not_windows([

This comment has been minimized.

@mrry

mrry Oct 6, 2016

Contributor

I'm confused why this is treated differently from the tf_kernel_libraries() target, and thus not in :math_not_windows. (It may have been in the wrong place originally, but it'd be good to reduce the complexity if possible.)

I believe it should be possible to remove :depthwise_conv_op from the deps here, and move it into the prefixes of :math_not_windows. Does that make sense?

@mrry

mrry Oct 6, 2016

Contributor

I'm confused why this is treated differently from the tf_kernel_libraries() target, and thus not in :math_not_windows. (It may have been in the wrong place originally, but it'd be good to reduce the complexity if possible.)

I believe it should be possible to remove :depthwise_conv_op from the deps here, and move it into the prefixes of :math_not_windows. Does that make sense?

This comment has been minimized.

@meteorcloudy

meteorcloudy Oct 6, 2016

Member

Sounds good, I'll verify if it works

@meteorcloudy

meteorcloudy Oct 6, 2016

Member

Sounds good, I'll verify if it works

This comment has been minimized.

@meteorcloudy

meteorcloudy Oct 6, 2016

Member

Hmm.. As you can see the definition here
Looks like math_now_windows lacks dependencies depthwise_conv_op needs.
:conv_ops, :ops_util, //tensorflow/core:nn_ops_op_lib
Do you think it's fine to add them into deps of math_not_window?

@meteorcloudy

meteorcloudy Oct 6, 2016

Member

Hmm.. As you can see the definition here
Looks like math_now_windows lacks dependencies depthwise_conv_op needs.
:conv_ops, :ops_util, //tensorflow/core:nn_ops_op_lib
Do you think it's fine to add them into deps of math_not_window?

This comment has been minimized.

@mrry

mrry Oct 7, 2016

Contributor

Sorry, I should have said that :depthwise_conv_ops should (probably, for consistency) be in a rule called :nn_not_windows. Does that make more sense?

@mrry

mrry Oct 7, 2016

Contributor

Sorry, I should have said that :depthwise_conv_ops should (probably, for consistency) be in a rule called :nn_not_windows. Does that make more sense?

This comment has been minimized.

@meteorcloudy

meteorcloudy Oct 7, 2016

Member

Yes, I agree!

@meteorcloudy

meteorcloudy Oct 7, 2016

Member

Yes, I agree!

Show outdated Hide outdated tensorflow/core/platform/default/build_config.bzl
})
def tf_additional_lib_srcs(exclude = []):
return select({

This comment has been minimized.

@mrry

mrry Oct 6, 2016

Contributor

Nit: only one space between return and select

@mrry

mrry Oct 6, 2016

Contributor

Nit: only one space between return and select

This comment has been minimized.

@meteorcloudy

meteorcloudy Oct 6, 2016

Member

Thanks~

@meteorcloudy

This comment has been minimized.

Show comment
Hide comment
@meteorcloudy

meteorcloudy Oct 6, 2016

Member

Can we test it again?

Member

meteorcloudy commented Oct 6, 2016

Can we test it again?

@yifeif

This comment has been minimized.

Show comment
Hide comment
@yifeif

yifeif Oct 6, 2016

Member

@tensorflow-jenkins test this please.

Member

yifeif commented Oct 6, 2016

@tensorflow-jenkins test this please.

@yifeif

This comment has been minimized.

Show comment
Hide comment
@yifeif

yifeif Oct 7, 2016

Member

@mrry could you take another look?

Member

yifeif commented Oct 7, 2016

@mrry could you take another look?

@meteorcloudy

This comment has been minimized.

Show comment
Hide comment
@meteorcloudy

meteorcloudy Oct 7, 2016

Member

@mrry Just pushed, please review again. :)

Member

meteorcloudy commented Oct 7, 2016

@mrry Just pushed, please review again. :)

@mrry

This comment has been minimized.

Show comment
Hide comment
@mrry

mrry Oct 7, 2016

Contributor

@tensorflow-jenkins test this please.

Contributor

mrry commented Oct 7, 2016

@tensorflow-jenkins test this please.

@mrry

mrry approved these changes Oct 7, 2016

Looks good to me - thanks for the cleanup!

@meteorcloudy

This comment has been minimized.

Show comment
Hide comment
@meteorcloudy

meteorcloudy Oct 7, 2016

Member

From the error message in Linux GPU, looks like we should also pull depthwise_conv_grad_op out of deps, but depthwise_conv_grad_op needs depthwise_conv_op.h. What should we do?

Member

meteorcloudy commented Oct 7, 2016

From the error message in Linux GPU, looks like we should also pull depthwise_conv_grad_op out of deps, but depthwise_conv_grad_op needs depthwise_conv_op.h. What should we do?

@meteorcloudy

This comment has been minimized.

Show comment
Hide comment
@meteorcloudy

meteorcloudy Oct 7, 2016

Member

Or maybe they are in the deps for a reason? See these two commits b51ef0c, 99e298a

Member

meteorcloudy commented Oct 7, 2016

Or maybe they are in the deps for a reason? See these two commits b51ef0c, 99e298a

@mrry

This comment has been minimized.

Show comment
Hide comment
@mrry

mrry Oct 7, 2016

Contributor

I'm not sure what the proper solution would be for them. Assigning to @jmchen-g, since he wrote those ops originally.

Contributor

mrry commented Oct 7, 2016

I'm not sure what the proper solution would be for them. Assigning to @jmchen-g, since he wrote those ops originally.

@jmchen-g

This comment has been minimized.

Show comment
Hide comment
@jmchen-g

jmchen-g Oct 10, 2016

Contributor

Which deps are not clear here? If you mean why it is in the deps of nn, Depthwise is one of the convs that we need to support in TF and is commonly used to reduce the computing needs for a model.

Contributor

jmchen-g commented Oct 10, 2016

Which deps are not clear here? If you mean why it is in the deps of nn, Depthwise is one of the convs that we need to support in TF and is commonly used to reduce the computing needs for a model.

@meteorcloudy

This comment has been minimized.

Show comment
Hide comment
@meteorcloudy

meteorcloudy Oct 10, 2016

Member

OK, I see. Sounds like depthwise_conv_ops is supposed to be in deps of nn. I've deleted my last commit. Please test again?

Member

meteorcloudy commented Oct 10, 2016

OK, I see. Sounds like depthwise_conv_ops is supposed to be in deps of nn. I've deleted my last commit. Please test again?

@meteorcloudy

This comment has been minimized.

Show comment
Hide comment
@meteorcloudy

meteorcloudy Oct 11, 2016

Member

@mrry Are you OK with keeping depthwise_conv_ops in deps of nn? If so, I think this PR is ready to be merged.

Member

meteorcloudy commented Oct 11, 2016

@mrry Are you OK with keeping depthwise_conv_ops in deps of nn? If so, I think this PR is ready to be merged.

@mrry

This comment has been minimized.

Show comment
Hide comment
@mrry

mrry Oct 11, 2016

Contributor

@meteorcloudy Yes, that's fine. I suspect our existing build rules are not as clean as they could be for this op, but that shouldn't derail what you're trying to do :).

I'll go ahead and merge now.

Contributor

mrry commented Oct 11, 2016

@meteorcloudy Yes, that's fine. I suspect our existing build rules are not as clean as they could be for this op, but that shouldn't derail what you're trying to do :).

I'll go ahead and merge now.

@mrry mrry assigned mrry and unassigned jmchen-g Oct 11, 2016

@mrry mrry merged commit 1975cd1 into tensorflow:master Oct 11, 2016

8 checks passed

Android Demo App SUCCESS
Details
Linux CPU Tests SUCCESS
Details
Linux CPU Tests (Python 3) SUCCESS
Details
Linux GPU SUCCESS
Details
MacOS CPU Tests SUCCESS
Details
Sanity Checks SUCCESS
Details
ci.tensorflow.org SUCCESS
Details
cla/google All necessary CLAs are signed
@meteorcloudy

This comment has been minimized.

Show comment
Hide comment
@meteorcloudy

meteorcloudy Oct 11, 2016

Member

Nice, thanks! \o/

Member

meteorcloudy commented Oct 11, 2016

Nice, thanks! \o/

@rohan100jain

This comment has been minimized.

Show comment
Hide comment
@rohan100jain

rohan100jain Oct 15, 2016

Member

This change is causing problems in testing now with BUILD rules. Apparently select() and glob() don't play nicely with each other.

Member

rohan100jain commented Oct 15, 2016

This change is causing problems in testing now with BUILD rules. Apparently select() and glob() don't play nicely with each other.

@meteorcloudy

This comment has been minimized.

Show comment
Hide comment
@meteorcloudy

meteorcloudy Oct 15, 2016

Member

@rohan100jain Can you show me the error message? I wonder why it's not causing problem before.. Users are beginning to try Windows build with Bazel now, reverting this will completely break them.

Member

meteorcloudy commented Oct 15, 2016

@rohan100jain Can you show me the error message? I wonder why it's not causing problem before.. Users are beginning to try Windows build with Bazel now, reverting this will completely break them.

@rohan100jain

This comment has been minimized.

Show comment
Hide comment
@rohan100jain

rohan100jain Oct 15, 2016

Member

Even after fixing the bug at https://github.com/tensorflow/tensorflow/pull/4796/files#diff-15609f78c0cd35bfdc31f9d8bd9b1e91R984

there are failures like:

https://ci.tensorflow.org/job/tensorflow-cl-sanity/14995/console

I tried various combinations of glob() and select() and none seem to work.

Member

rohan100jain commented Oct 15, 2016

Even after fixing the bug at https://github.com/tensorflow/tensorflow/pull/4796/files#diff-15609f78c0cd35bfdc31f9d8bd9b1e91R984

there are failures like:

https://ci.tensorflow.org/job/tensorflow-cl-sanity/14995/console

I tried various combinations of glob() and select() and none seem to work.

@bssrdf

This comment has been minimized.

Show comment
Hide comment
@bssrdf

bssrdf Oct 16, 2016

Just want to follow up on this. I managed to build label_image example using the same procedure. Have to fix a bug at bssrdf@58f93bd#diff-f6254560279f78a9d74e66dfd2d85977 for it to build on win. But it failed to run due to some file path issue

$ bazel-bin/tensorflow/examples/label_image/label_image
E C:\tools\msys64\var\tmp\Bazel\s5nfxnhX\execroot\tensorflow\tensorflow\examples\label_image\main.cc:279] Not found: Failed to load compute graph at 'tensorflow/examples/label_image/data/tensorflow_inception_graph.pb'

although the file tensorflow_inception_graph.pb is present in that directory.

bssrdf commented Oct 16, 2016

Just want to follow up on this. I managed to build label_image example using the same procedure. Have to fix a bug at bssrdf@58f93bd#diff-f6254560279f78a9d74e66dfd2d85977 for it to build on win. But it failed to run due to some file path issue

$ bazel-bin/tensorflow/examples/label_image/label_image
E C:\tools\msys64\var\tmp\Bazel\s5nfxnhX\execroot\tensorflow\tensorflow\examples\label_image\main.cc:279] Not found: Failed to load compute graph at 'tensorflow/examples/label_image/data/tensorflow_inception_graph.pb'

although the file tensorflow_inception_graph.pb is present in that directory.

@meteorcloudy

This comment has been minimized.

Show comment
Hide comment
@meteorcloudy

meteorcloudy Oct 17, 2016

Member

@rohan100jain Hmm.. This is weird.
Here's what I tested:
I reset my branch to c20da14, which is just before reverting this change.
And I ran bazel build --nobuild tensorflow/... on Mac, Linux and Windows. It all passed.

For glob() and select(), as long as you don't use select as arguments of glob, they will work just fine. So I'm not sure why it's failing like that on ci.

Can you give me some instructions to reproduce this error?

Member

meteorcloudy commented Oct 17, 2016

@rohan100jain Hmm.. This is weird.
Here's what I tested:
I reset my branch to c20da14, which is just before reverting this change.
And I ran bazel build --nobuild tensorflow/... on Mac, Linux and Windows. It all passed.

For glob() and select(), as long as you don't use select as arguments of glob, they will work just fine. So I'm not sure why it's failing like that on ci.

Can you give me some instructions to reproduce this error?

@meteorcloudy

This comment has been minimized.

Show comment
Hide comment
@meteorcloudy

meteorcloudy Oct 17, 2016

Member

I believe there is nothing wrong with this PR, but I am not sure how to get it back. Another PR or can you revert the revert commit? @mrry @rohan100jain

Member

meteorcloudy commented Oct 17, 2016

I believe there is nothing wrong with this PR, but I am not sure how to get it back. Another PR or can you revert the revert commit? @mrry @rohan100jain

@mrry

This comment has been minimized.

Show comment
Hide comment
@mrry

mrry Oct 17, 2016

Contributor

I think the best way forward is another PR. Should be quick to review if it's the same as the last one :).

Contributor

mrry commented Oct 17, 2016

I think the best way forward is another PR. Should be quick to review if it's the same as the last one :).

@meteorcloudy

This comment has been minimized.

Show comment
Hide comment
@meteorcloudy

meteorcloudy Oct 17, 2016

Member

OK, PR sent. #5009

Member

meteorcloudy commented Oct 17, 2016

OK, PR sent. #5009

@meteorcloudy meteorcloudy deleted the meteorcloudy:windows-build-example-trainer branch Mar 29, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment