New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimizers in the C++ API - Issue 9837 #11377

Closed
wants to merge 6 commits into
base: master
from

Conversation

@theflofly
Contributor

theflofly commented Jul 8, 2017

This pull request adds the Optimizer base class and the GradientDescentOptimizer class to the C++ API.
More details here: #9837

@tensorflow-jenkins

This comment has been minimized.

Collaborator

tensorflow-jenkins commented Jul 8, 2017

Can one of the admins verify this patch?

@googlebot googlebot added the cla: yes label Jul 8, 2017

@@ -0,0 +1,33 @@
/* Copyright 2015 The TensorFlow Authors. All Rights Reserved.

This comment has been minimized.

@caisq

caisq Jul 9, 2017

Contributor
  1. Same below.
return tensorflow::ops::ApplyGradientDescent(scope.NewSubScope("update"),
{var},
tensorflow::ops::Cast(scope.NewSubScope("learning_rate"),
learning_rate_,

This comment has been minimized.

@caisq

caisq Jul 9, 2017

Contributor

Please fix indentation and conform to Google C++ style.
https://google.github.io/styleguide/cppguide.html

You can also find instructions on how to run clang-tidy on the code here:
https://github.com/tensorflow/tensorflow/blob/master/CONTRIBUTING.md#c-coding-style


namespace tensorflow {

typedef std::vector<std::tuple<Output, Output>> GradAndVar;

This comment has been minimized.

@caisq

caisq Jul 9, 2017

Contributor

Should this be moved into class Optimizer?

// the forward node should be the same for the test and expected scope
// TODO(theflofly): merge Const and Assign using one constructor as in python
auto x = Variable(scope.WithOpName("x"), {2, 2}, DT_FLOAT);
auto assign_x = Assign(scope.WithOpName("Assign_x"), x, Const(scope, {{1.0f, 2.0f}, {3.0f, 4.0f}}));

This comment has been minimized.

@caisq

caisq Jul 9, 2017

Contributor

C++ style issue.

std::vector<Tensor> outputs;
TF_CHECK_OK(session.Run({layer_1}, &outputs));

test::ExpectTensorEqual<float>(outputs[0], test::AsTensor<float>({-0.66430414, -0.95039594, -0.99360687}, {3, 1}));

This comment has been minimized.

@caisq

caisq Jul 9, 2017

Contributor

This should probably use ExpectTensorNear to be more robust.

}

} // namespace
} // namespace tensorflow

This comment has been minimized.

@caisq

caisq Jul 9, 2017

Contributor

C++ style issue. Every file needs a line break at the end.

std::vector<Output> var_list;

for (Node* node : scope.graph()->nodes()) {
if (::tensorflow::grappler::IsVariable(node->def())) {

This comment has been minimized.

@caisq

caisq Jul 9, 2017

Contributor

This should skip untrainable variables. Need a unit test for that as well.

This comment has been minimized.

@theflofly

theflofly Jul 10, 2017

Contributor

Do you know how I can do that ? Because we don't have a collection of trainable vars as opposed to the python version where we use ops.GraphKeys.TRAINABLE_VARIABLES ?

This comment has been minimized.

@suharshs

suharshs Jul 12, 2017

Member

I don't think the c_api has support for trainable variables (not sure what the plan is). I think for now we can add a TODO for this. @asimshankar may know more.

@caisq caisq requested a review from suharshs Jul 9, 2017

@caisq

This comment has been minimized.

Contributor

caisq commented Jul 9, 2017

@tensorflow-jenkins test this please

@caisq

This comment has been minimized.

Contributor

caisq commented Jul 9, 2017

@theflofly please fix the following sanity check failure (https://ci.tensorflow.org/job/tensorflow-pull-requests-sanity/5149/consoleFull):

=== Sanity check step 3 of 8: do_buildifier (buildifier check) ===

Running do_buildifier on 204 files

tensorflow/cc/BUILD # reformat listsort unsafesort sort:cc_library.srcs sort:tf_cc_test.deps

buildifier took 0 s

FAIL: buildifier found errors and/or warnings in above BUILD files.
buildifier suggested the following changes:
589d588
< ":testutil",
590a590

    ":testutil",

602a603

    "training/gradient_descent_optimizer.cc",

604,605c605
< "training/gradient_descent_optimizer.cc"
< ],

],

606a607

    "training/gradient_descent_optimizer.h",

608,609c609
< "training/gradient_descent_optimizer.h"
< ],

],

617c617
< )
\ No newline at end of file

)
Please fix manually or run buildifier to auto-fix.

@theflofly

This comment has been minimized.

Contributor

theflofly commented Jul 10, 2017

I made edits following your comments. Just the part about checking that the variables are trainable, I don't know exactly how we can do it as we don't have a list of trainable vars currently. If you have an idea?

@caisq

This comment has been minimized.

Contributor

caisq commented Jul 10, 2017

@theflofly Thanks for addressing my comments. Regarding trainable vs untrainable variables, there are untrainable variables in the Python API, such as global step. I'm not entirely sure whether this is also the case in the C++ API. @suharshs is the expert on this topic and he should be able to provide more insight.

@theflofly

This comment has been minimized.

Contributor

theflofly commented Jul 11, 2017

Also I'll complete the API guide explaining how to use the Optimizer in the C++ API once the code and method names are validated.

@jianlong-yuan

This comment has been minimized.

jianlong-yuan commented Jul 12, 2017

Hi you compiled c++ with g++ or bazel?

@theflofly

This comment has been minimized.

Contributor

theflofly commented Jul 12, 2017

@yjl9122 Both I'd say, bazel is a build tool not a compiler. I am using mac os and ubuntu, so bazel is either using llvm or gcc regarding the os (I guess). FYI: Debugging with llvm and bazel is not working currently (bazelbuild/bazel#2537).

@suharshs

This comment has been minimized.

Member

suharshs commented Jul 12, 2017

I don't think the c_api has support for trainable variables (not sure what the plan is), because it doesn;t have collections AFAIK. I think for now we can add a TODO for this.
@asimshankar may have more information about this.

@suharshs suharshs assigned asimshankar and unassigned suharshs Jul 12, 2017

@DimanNe

This comment has been minimized.

Contributor

DimanNe commented Jul 19, 2017

Can anybody tell me the current status of this PR? Is it correct, that the problem is that we do not know whether a variable is a trainable one or not? Can anyone share how it is made in Python?

I am asking it, because I am about to start tackling the issue and have some ideas of how to do it, but firstly I would like to know the current status.

@theflofly

This comment has been minimized.

Contributor

theflofly commented Jul 20, 2017

@DimanNe the current status is: we are waiting for someone from Google to review the code and merge.

I don't see untrainable variable in the C++ API, do you ? Even if there was, the check would be nice but not mandatory. If there are such variables, I will do the check (don't bother yourself :)).

I don't think there is something to add to this PR unless someone from google says otherwise. What do you think ?

@asimshankar could you review this PR please ?

@asimshankar

This comment has been minimized.

Member

asimshankar commented Jul 24, 2017

@theflofly : Sorry for the delay. I will plan on taking a look at this later this week.

@theflofly

This comment has been minimized.

Contributor

theflofly commented Jul 25, 2017

@asimshankar: No worries, FYI I will not be available for the next 7 days.

@theflofly

This comment has been minimized.

Contributor

theflofly commented Aug 2, 2017

@asimshankar Now available :)

@rmlarsen

This comment has been minimized.

Member

rmlarsen commented Aug 8, 2017

@asimshankar could you review this, please?

@rmlarsen rmlarsen requested a review from asimshankar Aug 8, 2017

@asimshankar

This comment has been minimized.

Member

asimshankar commented Aug 8, 2017

@theflofly : Apologies for the late response here. Honestly, I was struggling with the choice between adding optimizers to each language API (i.e., this one for C++ and other implementations for other languages), or figuring out a scheme to share the optimizers across languages (for example, by providing a C API for them that all language bindings, including Python, will build on).

The fear with implementing higher level constructs in each language is that we will be unable to provide adequate support for all, and more importantly they will diverge over time.

Given that, I'm tempted to not merge this PR.
I really do appreciate the effort you've put into this PR though.

Do you think it could be packaged as an "extension" in your own Github repository?

@theflofly

This comment has been minimized.

Contributor

theflofly commented Aug 8, 2017

@asimshankar: Surprising. In that case tensorflow/cc/gradients should be removed too no? Because this PR is merely calls to already existing code in the gradient directory.

@theflofly

This comment has been minimized.

Contributor

theflofly commented Aug 9, 2017

@asimshankar: Also, as the TensorFlow core (kernels) are developed in C++, the base Optimizers should too no? Why C?

Three months ago you clearly gave a go for that functionality, and here we are. I would be surprised that at Google you rewrite all the stuff done in Python and then use only bindings for the python itself, even if clearly it would be the best solution to maintains only one implementation and not one by language (and it should have been done from the beginning). Is this in the roadmap or just a thought for now?

I mean, I am totally okay about discarding this PR for a greater good, even work on that greater good.

@rmlarsen

This comment has been minimized.

Member

rmlarsen commented Aug 9, 2017

@josh11b can you comment on this?

@skye

This comment has been minimized.

Member

skye commented Aug 9, 2017

FYI @josh11b and @asimshankar are both on vacation. Asim should be back sometime next week and Josh in two weeks.

@asimshankar

This comment has been minimized.

Member

asimshankar commented Aug 9, 2017

@theflofly : Firstly, I'd like to thank you for the PR and for pursuing this. I think there is some misunderstanding, so let me try to clarify my postings a bit. There were two points I wanted to make:

  • I didn't mean to suggest that optimizers be implemented in C, but they be accessible from the C API. I meant to suggest that we think through the path to making these features accessible to other language APIs (even if we don't get to implementing it all immediately). Perhaps the right solution still involves your C++ Optimizer class made accessible via C API calls (just like we do for other runtime constructs in the C API implementation), but let's think it through. I do apologize though, this is something I should probably have pointed out earlier in the discussion in #9837.

  • On an unrelated note: The ecosystem would benefit from the ability to have extensions to the TensorFlow APIs built, owned and hosted by contributors so that ideas (particularly in early stages) are not slowed down by unavailable bandwidth from the TensorFlow maintainers. It appeared to me (but perhaps I'm wrong) that this particular feature would have been a great way to demonstrate that as it is purely additive over existing C++ APIs (though, maybe some of the changes you require could be split out into their own PR). But, I do admit that this idea is somewhat orthogonal to the content of the PR itself.

Regarding other points you have raised since: I would argue that the C++ gradients should not be removed precisely because they are providing a path to constructing the gradient graph in other languages (TF_AddGradients). And in fact, internally we have even discussed ways in which gradient functions defined in one language can be made accessible to others. So while the implementation isn't quite there yet (and isn't on the top of the priority list yet), I feel comfortable with the thought put into the plan for gradient functions.

Regarding this PR, it would be helpful to chart out the full plan in a document before worrying about specifically the C++ code for a gradient descent optimizer. Again, it's okay if we don't get to implementing everything right away, but it would be helpful to have a design and plan charted out and agreed upon. That would also allow for others in the community interested in this area to productively contribute.

Does that sound reasonable to you? If so, I'm happy to engage on charting a path out in a document. If not, I'd be glad to hear your concerns.

@theflofly

This comment has been minimized.

Contributor

theflofly commented Aug 9, 2017

@asimshankar: Thanks for answering whereas you are in vacation (apparently). If the C++ gradients are used by other languages, yes it makes sense to keep them (the C thing confused me).

  • I am okay about drafting a document, even if I am not sure about the question it will answer, from my understanding: "Steps to create base gradients and optimizers to be used by others languages"?

  • Should not the gradients (and optimizers in the future) be moved in the core dir so it is clear that they are the base and other languages should use them (this can be answered in the document)?

  • For the plugin thing I agree for exotic things such as LBFGS for instance, but for basic stuff like backprop (and optimizers for a higher level), I think it should be part of the TensorFlow project else you can't train using the C++ API.

  • Finally what should we do about this PR?

@dguerra

This comment has been minimized.

dguerra commented Aug 10, 2017

@asimshankar Because I have c++ code I want to reuse in a new project. I don't know if it is worth rewriting old code for python or wait for c++ API in order to use tensorflow. is it there an estimated time for when c++ API is going to usable for training ? How can we contribute to this end ?

@theflofly

This comment has been minimized.

Contributor

theflofly commented Aug 10, 2017

[UNRELATED TO THE PR]

@dguerra I will focus on the missing gradient operations. Once the missing gradients are added the training is possible, just not as easy.

@dguerra

This comment has been minimized.

dguerra commented Aug 16, 2017

@theflofly Once you have one example of gradient implementation for a binary operation I'll be happy to add more operations myself. Being able to train any kind of neural networks (although very simple ones) would be a nice start

@theflofly

This comment has been minimized.

Contributor

theflofly commented Aug 16, 2017

[UNRELATED TO THE PR]

@dguerra Someone beat me at it. @rmlarsen merged 19 hours ago. You now have:

  • AddGrad
  • SubGrad
  • MulGrad
  • DivGrad
  • RealDivGrad
  • SquaredDifferenceGrad

in tensorflow/cc/gradients/math_grad.cc. You code should now run smoothly.

The thing that bother me is that for instance, in core we have:

Status AddGrad(const AttrSlice& attrs, FunctionDef* g) {
  // clang-format off
  return GradForBinaryCwise(g, {
      {{"gx"}, "Identity", {"dz"}},
      {{"gy"}, "Identity", {"dz"}},
  });
  // clang-format on
}
REGISTER_OP_GRADIENT("Add", AddGrad);

and in the cc/gradient we have:

Status AddGrad(const Scope& scope, const Operation& op,
                const std::vector<Output>& grad_inputs,
                std::vector<Output>* grad_outputs) {
   // y = x_1 + x_2
   // dy/dx_1 = dy/dx_2 = 1
   auto gx_1 = Identity(scope, grad_inputs[0]);
   auto gx_2 = Identity(scope, grad_inputs[0]);
   return BinaryGradCommon(scope, op, grad_outputs, gx_1, gx_2);
 }
 REGISTER_GRADIENT_OP("Add", AddGrad);

So I am wondering if there is a way to use the kernel code instead of redefining it in the gradient API.

@theflofly

This comment has been minimized.

Contributor

theflofly commented Aug 17, 2017

[UNRELATED TO THE PR]

@dguerra: Actually it still does not work because of the Assign node. I made some changes to gradients.cc in this PR, I'll make another PR with only these changes so that we can use AddSymbolicGradients.

@dguerra

This comment has been minimized.

dguerra commented Aug 18, 2017

@theflofly It works well now for me. I tried both methods: AddSymbolicGradients and explicit gradient formula and it gives same results. In my example "Assign" node is only called once for initialization so not need to go through it when backpropagating the error. I will go now for a neural network with few layers and then convolutional nets to see what happens

@theflofly

This comment has been minimized.

Contributor

theflofly commented Aug 18, 2017

[UNRELATED TO THE PR]

@dguerra: Really ?
Do you give a Assign node or a Var node as input to an op ?
If you have:

auto w1 = Variable(scope, {3, 3}, DT_FLOAT);
auto assign_w1 = Assign(scope, w1, Const(scope, ...));

then are you doing this: Tanh(scope, w1);(1) or Tanh(scope, assign_w1);(2) ? Because if you are using assign_ node as in (2) everywhere, your variables will never change, as at each step the const value is reassigned. If you are using var as in (1) the gradient graph will not be correct because of an error that I solved.
I am replacing the tests from Const to Var and I will do a PR.
If you are able to train a network (add ApplyGradientDescent nodes to you graph) and it works please share your code because I may be wrong but I did not succeed to train without editing gradients.cc.
Also we should stop polluting this PR I guess.

@dguerra

This comment has been minimized.

dguerra commented Aug 18, 2017

@theflofly yes sure, I've used variables as inputs to the ops.

Shall we start a new thread or platform for conversation about general usage and development of Tensorflow C++ API ?

#include "tensorflow/cc/client/client_session.h"
#include "tensorflow/cc/ops/standard_ops.h"
#include "tensorflow/cc/framework/gradients.h"
#include "tensorflow/core/framework/tensor.h"


int main()
{
  using namespace tensorflow;
  using namespace tensorflow::ops;
  Scope root = Scope::NewRootScope();
  
  auto W = Variable(root.WithOpName("W"), {3,1}, DT_DOUBLE);

  auto x = Placeholder(root.WithOpName("x"), DT_DOUBLE);
  auto y = Placeholder(root.WithOpName("y"), DT_DOUBLE);

  auto d = Subtract(root, y, MatMul(root, x, W));
  
  // Compute gradients
  auto dd = MatMul(root, d, d, MatMul::TransposeA(true));
  // auto half = Const(root, {0.5});
  auto loss = MatMul(root, dd, {{0.5}}); 

  double learning_rate = 0.1;
 

  std::vector<Output> grad_outputs;
  TF_CHECK_OK(AddSymbolicGradients(root, {loss}, {W}, &grad_outputs));

  //Explicit gradient formula:
  //auto grad_W = Subtract(root, MatMul(root, MatMul(root, x, x, MatMul::TransposeA(true)), W), MatMul(root, x, y,  MatMul::TransposeA(true)));
  
  // apply either: the ouput from AddSymbolicGradients or explicit gradient formula
  auto apply_grad_W = ApplyGradientDescent(root, W, learning_rate,  grad_outputs[0]);

  //Initialize variables
  auto init_W = Assign(root, W, {{1.0},{1.0},{1.0}});

  std::vector<Tensor> outputs;
  ClientSession session(root);

  //Run variable initializers
  session.Run({init_W}, &outputs);

  for(unsigned int i=0;i<200;++i)
  {
    //y = 3.0 * x1 + 4.0 * x2 + 5.0
    TF_CHECK_OK(session.Run( { {x,{{1.0,-1.0,3.0}, {1.0,2.0,1.0}, {1.0,-2.0,-2.0}, {1.0,0.0,2.0}}}, {y,{{14.0}, {15.0}, {-9.0}, {13.0}}} } , {loss, apply_grad_W}, &outputs));
    std::cout << std::string("loss: ") << outputs[0].scalar<double>() << std::endl << std::string("weights: ")<< outputs[1].matrix<double>() << std::endl;
  }
  return 0;
}

@theflofly

This comment has been minimized.

Contributor

theflofly commented Aug 18, 2017

[UNRELATED TO THE PR]

@dguerra: I see :). It is because you are creating the Assign node after the call to AddSymbolicGradients, as it is not here during the gradient graph creation, there is no problem.

Put it right after the declaration of W and it will not work anymore. Was it on purpose or random luck ?

My code for a var looks more like:


// weights init
auto w1 = Variable(scope, {3, 3}, DT_FLOAT);
auto assign_w1 = Assign(scope, w1, Const(scope, {{0.1f, 0.1f, 0.1f}, {0.2f, 0.2f, 0.2f}, {0.2f, 0.2f, 0.2f}}));

In that case it does not work, so I'll make a PR anyway. I don't know where we could talk about that, I'll add [UNRELATED TO THE PR] before my comments, so that the person reviewing the PR will focus on important ones.

@dguerra

This comment has been minimized.

dguerra commented Aug 18, 2017

@theflofly Variable initialization should go in a different branch of the graph so when you "run" the initialization part it does not execute the rest and vice versa. I think it is done in a similar manner in Python. I haven't been able to apply random initialization to a variable yet. Have you?

@theflofly

This comment has been minimized.

Contributor

theflofly commented Aug 18, 2017

[UNRELATED TO THE PR]

@dguerra: Did you try https://www.tensorflow.org/api_docs/cc/class/tensorflow/ops/random-normal ?

In python we call w1 = tf.Variable(tf.random_normal([3, 3]), name="W1") and if you read the code, it says "This constructor creates both a variable Op and an assign Op to set the
variable to its initial value.". So the Var an Assign op are added to the graph at the same level, not after the gradients call.
I agree the init run on the apply ops must be done before the run on the graph (and only once). A Variable Op in C++ that create a Var + Assign would be nice, so would be the same kind of mechanism: tf.global_variables_initializer().

@theflofly

This comment has been minimized.

Contributor

theflofly commented Aug 18, 2017

[UNRELATED TO THE PR]

@dguerra: The PR is there finally: #12397

@dguerra

This comment has been minimized.

dguerra commented Aug 21, 2017

@theflofly some kind of variable initializer would be a useful addition. In addition it would be useful as well to add some kind of function to get the set of trainable variable in the graph. Such as trainable_variable() or get_collection() in Python. What do you think?

@theflofly

This comment has been minimized.

Contributor

theflofly commented Aug 21, 2017

@dguerra: Totally, I'll maybe work on it when I will be done with my gradients operations.
I guess we should have a Variable(...) that allows to create a var, a const and an assign node automatically all in one call. This Variable(...) would also complete a list of trainable var and a list of assign node. The list of Assign node would then be used during the init step.

@asimshankar

This comment has been minimized.

Member

asimshankar commented Aug 24, 2017

Catching up on this now that I'm back. Seems like there has been a bunch of discussion unrelated to this specific PR :). Perhaps, as @dguerra suggested, this discussion can be moved off this PR into a separate issue?

@theflofly : Responses to your comments above:

  • The document would outline a plan for making optimizers available in other languages. Most likely this would consist of at least two parts: (1) Detailed design of the C++ API (such as handling of sparse gradients, slots), perhaps even an introduction of a Variable class in the API and (2) Design of a C API for optimizers (which will likely wrap over the C++ API), sample outlines of the implementation in one or two language to provide confidence that the API is sufficient, discussion of if/how users in other languages can add new optimizers and how they can be made available across languages (or does doing so require a C++ implementation) etc. I think it might be best for someone on the TensorFlow team who has experience with other language bindings to start on such a draft and share it here for comments/suggestions/improvements/discussion. I will try to get this going on our end, but do not have a timeline yet.

  • There is some confusion about the gradients in core/ vs cc/, @suharshs knows the details, but long story short, the gradients in cc/ are accessible in other languages. They are the ones picked up by TF_AddGradients. So, that mechanism is already setup (the unfortunate directory structure confusion notwithstanding).

  • I agree that ultimately Optimizers should be part of the API. However, a separate plugin/repository for now was a suggestion to allow for rapid progress that doesn't block on us (TensorFlow maintainers) from being to have the bandwidth to provide support for it.

A technical detail regarding variables: We should be using resource variables (class ResourceVariable in Python) instead of reference variables, as they have more clearly defined semantics. We're aiming to switch Python to that as well in upcoming releases. For example, this would mean using ResourceApplyGradientDescent instead of the ApplyGradientDescent operation. I apologize that this distinction isn't well documented (it has been mostly an implementation detail, but now that we're seeing contributions for other languages, it is an implementation detail that more people will be interested in :)

Regarding this PR itself: Even if we were to ignore the C API and other languages and think only of a C++ Optimizer, there are some broader design considerations that need to be thought through - sparse gradients, slots, reference variables.

A starting point might be a write-up that looks at the features required to implement the various optimizers that exist in Python today and their handling of dense and sparse variables.

As you've noted, any PRs to help improve coverage of the C++ gradient registry are also greatly appreciated.

@theflofly

This comment has been minimized.

Contributor

theflofly commented Aug 24, 2017

@asimshankar: Ok. Maybe you should create the document (google drive I guess :)) and add me on it, my email address being floriancourtial@gmail.com. Also shouldn't I close this PR? Because the result in the end will be far away from what has been done here I'd say.

@asimshankar

This comment has been minimized.

Member

asimshankar commented Aug 25, 2017

Thanks for your understanding @theflofly . Will close the PR for now and keep you in the loop.
In the mean time, please do feel encouraged to make other contributions like you've been doing to the gradient functions and the bug fixes. Much appreciated

@theflofly

This comment has been minimized.

Contributor

theflofly commented Aug 25, 2017

@asimshankar Should I close the original issue #9837 or keep it to track the work in progress on Optimizers?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment