Optimizers in the C++ API - Issue 9837 #11377
Conversation
Can one of the admins verify this patch? |
@@ -0,0 +1,33 @@ | |||
/* Copyright 2015 The TensorFlow Authors. All Rights Reserved. |
caisq
Jul 9, 2017
Contributor
- Same below.
- Same below.
return tensorflow::ops::ApplyGradientDescent(scope.NewSubScope("update"), | ||
{var}, | ||
tensorflow::ops::Cast(scope.NewSubScope("learning_rate"), | ||
learning_rate_, |
caisq
Jul 9, 2017
Contributor
Please fix indentation and conform to Google C++ style.
https://google.github.io/styleguide/cppguide.html
You can also find instructions on how to run clang-tidy on the code here:
https://github.com/tensorflow/tensorflow/blob/master/CONTRIBUTING.md#c-coding-style
Please fix indentation and conform to Google C++ style.
https://google.github.io/styleguide/cppguide.html
You can also find instructions on how to run clang-tidy on the code here:
https://github.com/tensorflow/tensorflow/blob/master/CONTRIBUTING.md#c-coding-style
|
||
namespace tensorflow { | ||
|
||
typedef std::vector<std::tuple<Output, Output>> GradAndVar; |
caisq
Jul 9, 2017
Contributor
Should this be moved into class Optimizer?
Should this be moved into class Optimizer?
// the forward node should be the same for the test and expected scope | ||
// TODO(theflofly): merge Const and Assign using one constructor as in python | ||
auto x = Variable(scope.WithOpName("x"), {2, 2}, DT_FLOAT); | ||
auto assign_x = Assign(scope.WithOpName("Assign_x"), x, Const(scope, {{1.0f, 2.0f}, {3.0f, 4.0f}})); |
caisq
Jul 9, 2017
Contributor
C++ style issue.
C++ style issue.
std::vector<Tensor> outputs; | ||
TF_CHECK_OK(session.Run({layer_1}, &outputs)); | ||
|
||
test::ExpectTensorEqual<float>(outputs[0], test::AsTensor<float>({-0.66430414, -0.95039594, -0.99360687}, {3, 1})); |
caisq
Jul 9, 2017
Contributor
This should probably use ExpectTensorNear to be more robust.
This should probably use ExpectTensorNear to be more robust.
} | ||
|
||
} // namespace | ||
} // namespace tensorflow |
caisq
Jul 9, 2017
Contributor
C++ style issue. Every file needs a line break at the end.
C++ style issue. Every file needs a line break at the end.
std::vector<Output> var_list; | ||
|
||
for (Node* node : scope.graph()->nodes()) { | ||
if (::tensorflow::grappler::IsVariable(node->def())) { |
caisq
Jul 9, 2017
Contributor
This should skip untrainable variables. Need a unit test for that as well.
This should skip untrainable variables. Need a unit test for that as well.
theflofly
Jul 10, 2017
Author
Contributor
Do you know how I can do that ? Because we don't have a collection of trainable vars as opposed to the python version where we use ops.GraphKeys.TRAINABLE_VARIABLES
?
Do you know how I can do that ? Because we don't have a collection of trainable vars as opposed to the python version where we use ops.GraphKeys.TRAINABLE_VARIABLES
?
suharshs
Jul 12, 2017
Contributor
I don't think the c_api has support for trainable variables (not sure what the plan is). I think for now we can add a TODO for this. @asimshankar may know more.
I don't think the c_api has support for trainable variables (not sure what the plan is). I think for now we can add a TODO for this. @asimshankar may know more.
@tensorflow-jenkins test this please |
@theflofly please fix the following sanity check failure (https://ci.tensorflow.org/job/tensorflow-pull-requests-sanity/5149/consoleFull): === Sanity check step 3 of 8: do_buildifier (buildifier check) === Running do_buildifier on 204 files tensorflow/cc/BUILD # reformat listsort unsafesort sort:cc_library.srcs sort:tf_cc_test.deps buildifier took 0 s FAIL: buildifier found errors and/or warnings in above BUILD files.
602a603
604,605c605
|
I made edits following your comments. Just the part about checking that the variables are trainable, I don't know exactly how we can do it as we don't have a list of trainable vars currently. If you have an idea? |
@theflofly Thanks for addressing my comments. Regarding trainable vs untrainable variables, there are untrainable variables in the Python API, such as global step. I'm not entirely sure whether this is also the case in the C++ API. @suharshs is the expert on this topic and he should be able to provide more insight. |
Also I'll complete the API guide explaining how to use the Optimizer in the C++ API once the code and method names are validated. |
Hi you compiled c++ with g++ or bazel? |
@yjl9122 Both I'd say, bazel is a build tool not a compiler. I am using mac os and ubuntu, so bazel is either using llvm or gcc regarding the os (I guess). FYI: Debugging with llvm and bazel is not working currently (bazelbuild/bazel#2537). |
I don't think the c_api has support for trainable variables (not sure what the plan is), because it doesn;t have collections AFAIK. I think for now we can add a TODO for this. |
Can anybody tell me the current status of this PR? Is it correct, that the problem is that we do not know whether a variable is a trainable one or not? Can anyone share how it is made in Python? I am asking it, because I am about to start tackling the issue and have some ideas of how to do it, but firstly I would like to know the current status. |
@DimanNe the current status is: we are waiting for someone from Google to review the code and merge. I don't see untrainable variable in the C++ API, do you ? Even if there was, the check would be nice but not mandatory. If there are such variables, I will do the check (don't bother yourself :)). I don't think there is something to add to this PR unless someone from google says otherwise. What do you think ? @asimshankar could you review this PR please ? |
@theflofly : Sorry for the delay. I will plan on taking a look at this later this week. |
@asimshankar: No worries, FYI I will not be available for the next 7 days. |
@asimshankar Now available :) |
@asimshankar could you review this, please? |
@theflofly : Apologies for the late response here. Honestly, I was struggling with the choice between adding optimizers to each language API (i.e., this one for C++ and other implementations for other languages), or figuring out a scheme to share the optimizers across languages (for example, by providing a C API for them that all language bindings, including Python, will build on). The fear with implementing higher level constructs in each language is that we will be unable to provide adequate support for all, and more importantly they will diverge over time. Given that, I'm tempted to not merge this PR. Do you think it could be packaged as an "extension" in your own Github repository? |
@asimshankar: Surprising. In that case tensorflow/cc/gradients should be removed too no? Because this PR is merely calls to already existing code in the gradient directory. |
@asimshankar: Also, as the TensorFlow core (kernels) are developed in C++, the base Optimizers should too no? Why C? Three months ago you clearly gave a go for that functionality, and here we are. I would be surprised that at Google you rewrite all the stuff done in Python and then use only bindings for the python itself, even if clearly it would be the best solution to maintains only one implementation and not one by language (and it should have been done from the beginning). Is this in the roadmap or just a thought for now? I mean, I am totally okay about discarding this PR for a greater good, even work on that greater good. |
@josh11b can you comment on this? |
FYI @josh11b and @asimshankar are both on vacation. Asim should be back sometime next week and Josh in two weeks. |
@theflofly : Firstly, I'd like to thank you for the PR and for pursuing this. I think there is some misunderstanding, so let me try to clarify my postings a bit. There were two points I wanted to make:
Regarding other points you have raised since: I would argue that the C++ gradients should not be removed precisely because they are providing a path to constructing the gradient graph in other languages ( Regarding this PR, it would be helpful to chart out the full plan in a document before worrying about specifically the C++ code for a gradient descent optimizer. Again, it's okay if we don't get to implementing everything right away, but it would be helpful to have a design and plan charted out and agreed upon. That would also allow for others in the community interested in this area to productively contribute. Does that sound reasonable to you? If so, I'm happy to engage on charting a path out in a document. If not, I'd be glad to hear your concerns. |
@asimshankar: Thanks for answering whereas you are in vacation (apparently). If the C++ gradients are used by other languages, yes it makes sense to keep them (the C thing confused me).
|
@asimshankar Because I have c++ code I want to reuse in a new project. I don't know if it is worth rewriting old code for python or wait for c++ API in order to use tensorflow. is it there an estimated time for when c++ API is going to usable for training ? How can we contribute to this end ? |
[UNRELATED TO THE PR] @dguerra I will focus on the missing gradient operations. Once the missing gradients are added the training is possible, just not as easy. |
@theflofly Once you have one example of gradient implementation for a binary operation I'll be happy to add more operations myself. Being able to train any kind of neural networks (although very simple ones) would be a nice start |
[UNRELATED TO THE PR] @dguerra Someone beat me at it. @rmlarsen merged 19 hours ago. You now have:
in The thing that bother me is that for instance, in core we have:
and in the cc/gradient we have:
So I am wondering if there is a way to use the kernel code instead of redefining it in the gradient API. |
[UNRELATED TO THE PR] @dguerra: Actually it still does not work because of the Assign node. I made some changes to gradients.cc in this PR, I'll make another PR with only these changes so that we can use AddSymbolicGradients. |
@theflofly It works well now for me. I tried both methods: AddSymbolicGradients and explicit gradient formula and it gives same results. In my example "Assign" node is only called once for initialization so not need to go through it when backpropagating the error. I will go now for a neural network with few layers and then convolutional nets to see what happens |
[UNRELATED TO THE PR] @dguerra: Really ?
then are you doing this: |
@theflofly yes sure, I've used variables as inputs to the ops. Shall we start a new thread or platform for conversation about general usage and development of Tensorflow C++ API ?
|
[UNRELATED TO THE PR] @dguerra: I see :). It is because you are creating the Assign node after the call to AddSymbolicGradients, as it is not here during the gradient graph creation, there is no problem. Put it right after the declaration of W and it will not work anymore. Was it on purpose or random luck ? My code for a var looks more like:
In that case it does not work, so I'll make a PR anyway. I don't know where we could talk about that, I'll add [UNRELATED TO THE PR] before my comments, so that the person reviewing the PR will focus on important ones. |
@theflofly Variable initialization should go in a different branch of the graph so when you "run" the initialization part it does not execute the rest and vice versa. I think it is done in a similar manner in Python. I haven't been able to apply random initialization to a variable yet. Have you? |
[UNRELATED TO THE PR] @dguerra: Did you try https://www.tensorflow.org/api_docs/cc/class/tensorflow/ops/random-normal ? In python we call |
…o optimizer-cc-issue-9837
@theflofly some kind of variable initializer would be a useful addition. In addition it would be useful as well to add some kind of function to get the set of trainable variable in the graph. Such as trainable_variable() or get_collection() in Python. What do you think? |
@dguerra: Totally, I'll maybe work on it when I will be done with my gradients operations. |
Catching up on this now that I'm back. Seems like there has been a bunch of discussion unrelated to this specific PR :). Perhaps, as @dguerra suggested, this discussion can be moved off this PR into a separate issue? @theflofly : Responses to your comments above:
A technical detail regarding variables: We should be using resource variables ( Regarding this PR itself: Even if we were to ignore the C API and other languages and think only of a C++ Optimizer, there are some broader design considerations that need to be thought through - sparse gradients, slots, reference variables. A starting point might be a write-up that looks at the features required to implement the various optimizers that exist in Python today and their handling of dense and sparse variables. As you've noted, any PRs to help improve coverage of the C++ gradient registry are also greatly appreciated. |
@asimshankar: Ok. Maybe you should create the document (google drive I guess :)) and add me on it, my email address being floriancourtial@gmail.com. Also shouldn't I close this PR? Because the result in the end will be far away from what has been done here I'd say. |
Thanks for your understanding @theflofly . Will close the PR for now and keep you in the loop. |
@asimshankar Should I close the original issue #9837 or keep it to track the work in progress on Optimizers? |
This pull request adds the Optimizer base class and the GradientDescentOptimizer class to the C++ API.
More details here: #9837