[Java] Support addition of gradient operations in a graph #20133

karllessard · 2018-06-20T01:26:40Z

This calls the C-api TF_AddGradients method through a new JNI binding for adding gradient nodes to a graph. It also includes an AddGradients wrapper for invoking this operation smoothly while building a graph using the new Java Ops API.

asimshankar

Thanks a bunch!

asimshankar · 2018-06-20T04:44:39Z

tensorflow/java/src/main/java/org/tensorflow/op/training/AddGradients.java

+ * }</pre>
+ */
+@Operator
+public class AddGradients implements Op, Iterable<Operand<?>> {


Would it make sense for this to be called Gradients instead of AddGradients? (We don't call the other Op implementations AddConstant or AddMatMul)

asimshankar · 2018-06-20T04:48:07Z

tensorflow/java/src/main/java/org/tensorflow/Graph.java

+   * @param dx
+   * @return the partial derivatives {@code dy} with the size of {@code x}
+   */
+  public Output<?>[] addGradients(Output<?>[] y, Output<?>[] x, Output<?>[] dx) {


The common case would be a single y and null for dx, so should we add another method for that common case? Something like:

public Output<?>[] addGradients(Output<?> y, Output<?> ...x)

And similarly for the Op implementation.

Sounds very good to me. Would it matter if we do that modification only in the Op class? I kind of see that this kind of interface optimization is the responsibility of the Ops API layer while the core classes focuses more on the implementation details.

I also worked on another Op that adds gradients nodes to the graph and immediately apply the descent on the input tensors instead of doing this manually in n steps, should I also go ahead with this one (perhaps in another PR)?

I don't feel terribly strongly, but have a mild preference to include the simple override in theGraph class as well.

Is the second Op you mentioned basically the equivalent of Optimizer.minimize() in Python? We can talk about that in a follow up, but as you suggested, not do that in this PR.

asimshankar · 2018-06-20T04:49:33Z

tensorflow/java/src/main/java/org/tensorflow/Graph.java

+   * If {@code dx} is null, the implementation will use dx of {@code OnesLike} for all
+   * shapes in {@code y}.
+   * 
+   * @param y


Fill in the documentation for these arguments?

asimshankar · 2018-06-20T04:58:45Z

tensorflow/java/src/main/java/org/tensorflow/Graph.java

+    long[] dxHandles = null;
+    int[] dxIndices = null;
+
+    for (int i = 0; i < y.length; ++i) {


Should the body of this function be enclosed in a:

Reference ref = ref(); try { // The main body of this method } finally { ref.close(); }

so that it remains thread-safe (i.e., a concurrent call to Graph.close() won't mess things up by rendering the elements of ?Handles invalid)

asimshankar · 2018-06-20T05:02:00Z

tensorflow/java/src/main/java/org/tensorflow/Graph.java

+    // e.g. given xHandles = [x0Handle, x1Handle, ...] and xIndices = [x0Index, x1Index, ..], we obtain 
+    // dy = [dy0Handle, dy1Handle, ..., dy0Index, dy1Index, ...]
+    long[] dyHandlesAndIndices;
+    synchronized (nativeHandleLock) {


If we do the ref() block as suggested above, then we don't need this but instead would use ref.nativeHandle() for the call to addGradients

asimshankar · 2018-06-20T05:10:16Z

tensorflow/java/src/test/java/org/tensorflow/GraphTest.java

+  @Test
+  public void addGradientsComputationOpsToGraph() {
+    try (Graph g = new Graph()) {
+      Output<Integer> a = TestUtil.constant(g, "A", new int[][] {{1},{2}});


Let's use Float instead of Integer as integer gradients are a bit iffy (in fact, we don't backprop through integer tensors in Python anymore - f637506, yes, yes, I know, it would be great if we were able to share that logic more easily :) But for now, let's at least have the example use a more realistic Float?

asimshankar · 2018-06-20T05:11:38Z

tensorflow/java/src/test/java/org/tensorflow/GraphTest.java

+      Output<Integer> ab = TestUtil.matmul(g, "AxB", a, b, false, false);
+      Output<Integer> abc = TestUtil.matmul(g, "AxBxC", ab, c, false, false);
+
+      Output<?>[] grad = g.addGradients(new Output<?>[] {abc}, new Output<?>[] {b, c}, null);


Can we improve test coverage here, perhaps by adding to SessionTest.java so that we also test the additional arguments (like dys)

asimshankar

Thanks for the update, apologies for the delay. A few more minor things, otherwise looks great!

asimshankar · 2018-06-26T01:39:06Z

tensorflow/java/src/test/java/org/tensorflow/GraphTest.java

+          .fetch(grads[0])
+          .fetch(grads[1])
+          .run();
+


To encourage best practices, should we also call .close() on the elements of outputs?

asimshankar · 2018-06-26T01:40:17Z

tensorflow/java/src/test/java/org/tensorflow/GraphTest.java

+
+      Output<?>[] grads = g.addGradients(toArray(y0, y1), toArray(x), null);
+
+      List<Tensor<?>> outputs = s.runner()


Same here:

try (Tensor<?> t = s.runner()....get(0)) { assertEquals(114.0f, t.floatValue(), 0.0f); }

?

asimshankar · 2018-06-26T01:42:05Z

tensorflow/java/src/main/java/org/tensorflow/Graph.java

+   * @param dx if not null, the partial derivatives of some loss function {@code L} w.r.t. {@code y}
+   * @return the partial derivatives {@code dy} with the size of {@code x}
+   */
+  public Output<?>[] addGradients(Output<?>[] y, Output<?>[] x, Output<?>[] dx) {


Is it really ?? Don't all of the tensors have to have the same type? In which case, should this be:

public Output<T>[] addGradients(Output<T>[] y, Output<T>[] x, Output<T>[] dx)

?

I wasn't sure about this, you probably know better than me: is it possible to have a graph with variables of different types? If it is guaranteed that could never happen, I'll gladly remove those wildcards to enforce type-safety.

Hmm...it actually is, since one could have say tf.casts on the path between x and y, something like:

import tensorflow as tf x = tf.placeholder(tf.float64) y = tf.square(tf.cast(x, tf.float32)) dy = tf.gradients(y, x)[0] print(x.dtype, y.dtype, dy.dtype)

So apologies, ignore my comment :)

asimshankar · 2018-06-26T01:45:01Z

tensorflow/java/src/test/java/org/tensorflow/GraphTest.java

+
+      Output<?>[] grads = g.addGradients(toArray(y), toArray(x), toArray(dx));
+
+      List<Tensor<?>> outputs = s.runner()


Ditto about try-with-resources for the returned Tensor.

asimshankar · 2018-06-26T01:45:32Z

tensorflow/java/src/test/java/org/tensorflow/TestUtil.java

@@ -36,7 +36,7 @@
        .<T>output(0);
  }

-  public static Output<?> addN(Graph g, Output<?>... inputs) {
+  public static <T> Output<T> addN(Graph g, Output<?>... inputs) {


Should the argument be Output<T> instead of Output<?> also?

Unfortunately, the compiler complains with a warning when using a parameterized type for a varargs parameter.

asimshankar · 2018-06-26T02:07:25Z

tensorflow/java/src/main/native/graph_jni.cc

+  // returned array contains both op handles and output indices, in pair
+  jlongArray dy_handles_and_indices = env->NewLongArray(nx << 1);
+  jlong* dy_elems = env->GetLongArrayElements(dy_handles_and_indices, nullptr);
+  for (int i = 0, j = nx; i < nx; ++i, ++j) {


In practice this probably won't matter at all since dy_elems will be a pretty small array, but did you consider encoding it as [handle0, index0, handle1, index1, ...] instead of [handle0, handle1, ..., index0, index1, ...]? The former would be more cache friendly :)

Ironically, I was using the format you proposed in my previous commit. I switched to the new one lately because:

that could be the new 'standard' for returning more than one value in a JNI binding, e.g. if we had 4 arrays to return, each of different length, the previous format won't work while the second will

we might find some optimization later to simply split the array in two instead of copying its elements one by one

I personally found it more elegant to iterate using two iterators at the same time :)

I don't remind reverting to the previous version if you want to.

It's an internal detail, so we can easily switch it around as we go along. I don't feel very strongly, but am slightly partial to the previous version. Your call.

Ok, so just for the sake of keeping that PR continuing smoothly, let's keep it that way for now

asimshankar · 2018-06-26T02:10:35Z

tensorflow/java/src/main/java/org/tensorflow/op/training/Gradients.java

@@ -0,0 +1,153 @@
+/* Copyright 2018 The TensorFlow Authors. All Rights Reserved.


Why training/? Seems like gradients could be a top level operation too? One can use gradients for things other than training :)

No problem. Right now, the Op classification is still obscure to me, I'm taking some guesses here and there but we will probably need to settle this at some point?

BTW, the Python code was recently updated to generate symbols in finer grained namespaces (c1ff116#diff-d75a97d0f69d6f87ab79be6e2423d87b). So perhaps we could use the same for Java (without the baggage of backward compatibility. For example, Acos can be just in the math namespace and not in the top level one).

If you're enthusiastic, I'd be more than happy to see a PR adding Java API defs :)

qlzh727 · 2018-06-28T19:53:29Z

Windows build failure seems to be a flaky. Submitting now.

googlebot added the cla: yes label Jun 20, 2018

qlzh727 requested a review from asimshankar June 20, 2018 03:24

qlzh727 self-assigned this Jun 20, 2018

qlzh727 added the awaiting review Pull request awaiting review label Jun 20, 2018

asimshankar suggested changes Jun 20, 2018

View reviewed changes

asimshankar suggested changes Jun 26, 2018

View reviewed changes

karllessard added 4 commits June 27, 2018 21:47

Support addition of gradient operations in a graph

fac56f9

First code review

9b7d92d

Second code review

52e32a7

Improve unit tests after TF_AddGradients fix

b7baff7

karllessard force-pushed the java-add-gradients branch from 8965aed to b7baff7 Compare June 28, 2018 03:08

asimshankar approved these changes Jun 28, 2018

View reviewed changes

asimshankar added awaiting testing (then merge) kokoro:force-run Tests on submitted change and removed awaiting review Pull request awaiting review labels Jun 28, 2018

kokoro-team removed the kokoro:force-run Tests on submitted change label Jun 28, 2018

qlzh727 added the kokoro:force-run Tests on submitted change label Jun 28, 2018

kokoro-team removed the kokoro:force-run Tests on submitted change label Jun 28, 2018

qlzh727 added the kokoro:force-run Tests on submitted change label Jun 28, 2018

kokoro-team removed the kokoro:force-run Tests on submitted change label Jun 28, 2018

asimshankar added the kokoro:force-run Tests on submitted change label Jun 28, 2018

kokoro-team removed the kokoro:force-run Tests on submitted change label Jun 28, 2018

qlzh727 merged commit 9752b11 into tensorflow:master Jun 28, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Java] Support addition of gradient operations in a graph #20133

[Java] Support addition of gradient operations in a graph #20133

karllessard commented Jun 20, 2018

asimshankar left a comment

asimshankar Jun 20, 2018

asimshankar Jun 20, 2018

karllessard Jun 20, 2018 •

edited

asimshankar Jun 21, 2018

asimshankar Jun 20, 2018

asimshankar Jun 20, 2018

asimshankar Jun 20, 2018

asimshankar Jun 20, 2018

asimshankar Jun 20, 2018

asimshankar left a comment

asimshankar Jun 26, 2018

asimshankar Jun 26, 2018

asimshankar Jun 26, 2018

karllessard Jun 26, 2018

asimshankar Jun 27, 2018

asimshankar Jun 26, 2018

asimshankar Jun 26, 2018

karllessard Jun 27, 2018

asimshankar Jun 26, 2018

karllessard Jun 26, 2018 •

edited

asimshankar Jun 27, 2018

karllessard Jun 27, 2018

asimshankar Jun 26, 2018

karllessard Jun 26, 2018

asimshankar Jun 27, 2018

qlzh727 commented Jun 28, 2018


		Output<?>[] grads = g.addGradients(toArray(y0, y1), toArray(x), null);

		List<Tensor<?>> outputs = s.runner()

		@@ -0,0 +1,153 @@
		/* Copyright 2018 The TensorFlow Authors. All Rights Reserved.

[Java] Support addition of gradient operations in a graph #20133

[Java] Support addition of gradient operations in a graph #20133

Conversation

karllessard commented Jun 20, 2018

asimshankar left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

karllessard Jun 20, 2018 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

asimshankar left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

karllessard Jun 26, 2018 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

qlzh727 commented Jun 28, 2018

karllessard Jun 20, 2018 •

edited

karllessard Jun 26, 2018 •

edited