Add eager mode gradients for ops. #108

nsthorat · 2018-04-05T18:29:48Z

From @nsthorat on January 18, 2018 15:56

The infrastructure for eager mode is now ready for gradient methods to be filled in!

Eager mode provides a new set of methods on NDArrayMath which allows the user to eagerly compute gradients. Most users will use an optimizer like this:

const weights = dl.variable(Array2D.randNormal([784, 10]));
const cost = optimizer.minimize(() => {
  const batch = data.nextTrainBatch(BATCH_SIZE);
  const y = math.matMul(batch.xs, weights);
  const loss = math.mean(math.softmaxCrossEntropyWithLogits(labels, ys));
  return loss;
});

You'll notice that there is no use of the Graph, we simply use ops on NDArrayMath directly inside of an optimizer.minimize() method.

You can find a full example of training MNIST in eager mode here: https://github.com/PAIR-code/deeplearnjs/blob/master/demos/mnist_eager/model.ts

As part of NDArrayMath we expose several new methods. The important ones are these:

math.gradients(f: () => cost, xs) which executes f() (which produces a scalar value) and returns the gradient of the output of f with respect to xs (which can be an NDArray or a string => NDArray map).
math.valueAndGradients(f: () => cost, xs) which is the same as math.gradients() but also returns the output of f().
math.vjp(f: () => y, x, dy) which computes a vector-jacobian product - it is similar to gradients, but allows f() to produce a non-scalar value and lets the user provide a dy. This is useful to compute a subset of backpropagation, or to tests gradients of a single op with a provided dy (this is how we unit test).
math.customGradient(f: () => {value, gradients}, xs) which allows the user to provide a custom gradient of an arbitrary function closure instead of using the default gradients of the ops in the function. We use this for numerical stability for ops like softmaxCrossEntropy, and for mean / sum so we can compute a faster gradient (instead of the combination of gradients of the kernels they use). Most of the time, you shouldn't need to use this.

Now that these methods exist and are relatively stable, we can flush out gradients for kernels and ops!

To add gradients for kernels, we simply need to add a derivative function to the executeKernel calls inside of NDArrayMath. An example:

const der = (dy: Array2D<'float32'>, y: Array2D) => {
  return {
    a: () => this.matMul(dy, b, MatrixOrientation.REGULAR, MatrixOrientation.TRANSPOSED),
    b: () => this.matMul(a, dy, MatrixOrientation.TRANSPOSED, MatrixOrientation.REGULAR)
  };
};
return this.backendEngine.executeKernel(
  'MatMul', {inputs: {a, b}, args: {aOrientation, bOrientation}}, der);

The derivative is an function that takes dy, and y, and returns an object whose keys are the inputs (as defined by the inputs argument to executeKernel) and returns a function that returns the derivative with respect to that input. These derivatives should not call executeKernel, rather call math ops directly (this is so we can compute second order gradients).

Two example PRs adding gradients:
tensorflow/tfjs-core#521
tensorflow/tfjs-core#544

Note that we have lots of gradients in the Graph layer already, we just need to move them over to the gradients defined in eager mode.

Here is the list of ops and whether the gradient has been implemented:

Copied from original issue: tensorflow/tfjs-core#561

The text was updated successfully, but these errors were encountered:

nsthorat · 2018-04-05T18:29:49Z

From @dsmilkov on January 18, 2018 16:15

For those that are interested:

leave a comment here to claim one or several ops
make a PR that adds gradients for those ops
After we merge your PR, we'll check the checkbox

I'd suggest to start with the unary ops.

nsthorat · 2018-04-05T18:29:49Z

From @israelvicars on January 19, 2018 17:20

I'll start abs. Thanks for tweeting the invitation to contribute.

nsthorat · 2018-04-05T18:29:50Z

From @manrajgrover on January 28, 2018 10:27

@nsthorat @dsmilkov A little guidance required here. What should the gradient of comparison and logical operations look like? Any good sources for the same?

nsthorat · 2018-04-05T18:29:51Z

From @gena on January 28, 2018 19:36

Added implementation for sigmoid, PR #603.

nsthorat · 2018-04-05T18:29:51Z

From @manrajgrover on February 9, 2018 13:57

@nsthorat We can tick off ceil, clip, cosh, floor, maximum, minimum, selu, sigmoid, sinh, softmax and tanh.

prelu, elu are partially implemented. Only gradients w.r.t. alpha need to be added.

leakyRelu and step gradients are in progress.

Since logical and comparison operations don't have an actual gradient, should we return NaN, zeros or not pass gradient function toexecuteKernel?

Also, looks like unary and binary operations are now almost covered. Would be needing pointers on next set to focus on.

nsthorat · 2018-04-05T18:29:52Z

From @dsmilkov on February 9, 2018 22:51

@manrajgrover Great q. For now, let's leave out logical and comparison ops and not pass a gradient function to executeKernel. Next would be reverse, slice, pad, concat (in that order). Thanks!

nsthorat · 2018-04-05T18:29:53Z

From @dsmilkov on February 19, 2018 3:30

I'll be taking reverse, slice, pad and concat (already started work). Thanks!

nsthorat · 2018-04-05T18:29:53Z

From @easadler on April 5, 2018 1:14

Seems like this may be a little out of date, but I'd like to help if you still would like help. I think oneHot is still available.

nsthorat · 2018-04-05T18:29:54Z

A high priority one would be batchNorm (which you'd return gradients for all of the parameters - this should be pretty straightforward) or resizeBilinear, if you're willing to take it on! resizeBilinear has a big filed here: #38

nsthorat · 2018-04-05T18:29:54Z

From @easadler on April 5, 2018 14:53

I will give batchNorm a shot!

tafsiri · 2018-04-13T16:39:45Z

I will work on resizeBilinear

jgartman · 2018-04-14T02:21:47Z

I can finish up pow.

jgartman · 2018-04-15T15:25:48Z

I'll finish matMul as well.

jgartman · 2018-04-26T02:41:08Z

I'll try localResponseNormalization

This PR adds the gradient for LRN. tensorflow/tfjs#108 FEATURE

nsthorat · 2018-10-24T19:49:51Z

Closing this out in favor of individual issues.

generic-github-user · 2018-12-14T00:42:41Z

Is there an updated list of which functions have gradients implemented and which do not?

* quick test. * bump travis * Revert change.

nsthorat added the help wanted label Apr 5, 2018

jgartman mentioned this issue Apr 14, 2018

Finish gradient for tf.pow tensorflow/tfjs-core#954

Merged

jgartman mentioned this issue Apr 15, 2018

Finish gradient for tf.matMul tensorflow/tfjs-core#957

Merged

jaxball mentioned this issue Apr 20, 2018

Implement the gradient for depthwiseConv2d. #71

Closed

easadler mentioned this issue Apr 25, 2018

add tests for tf.conv1d gradients tensorflow/tfjs-core#992

Merged

tafsiri mentioned this issue May 24, 2018

gradient function for tf.where #338

Closed

jgartman mentioned this issue Jul 10, 2018

Gradient for LRN tensorflow/tfjs-core#1149

Merged

dsmilkov pushed a commit to tensorflow/tfjs-core that referenced this issue Jul 14, 2018

Gradient for LRN (#1149)

5fff918

This PR adds the gradient for LRN. tensorflow/tfjs#108 FEATURE

nsthorat closed this as completed Oct 24, 2018

nsthorat pushed a commit that referenced this issue Aug 19, 2019

Travis testing on linux. (#108)

d49198f

* quick test. * bump travis * Revert change.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add eager mode gradients for ops. #108

Add eager mode gradients for ops. #108

nsthorat commented Apr 5, 2018 •

edited by tafsiri

nsthorat commented Apr 5, 2018

nsthorat commented Apr 5, 2018

nsthorat commented Apr 5, 2018

nsthorat commented Apr 5, 2018

nsthorat commented Apr 5, 2018

nsthorat commented Apr 5, 2018

nsthorat commented Apr 5, 2018

nsthorat commented Apr 5, 2018

nsthorat commented Apr 5, 2018

nsthorat commented Apr 5, 2018

tafsiri commented Apr 13, 2018

jgartman commented Apr 14, 2018

jgartman commented Apr 15, 2018

jgartman commented Apr 26, 2018

nsthorat commented Oct 24, 2018

generic-github-user commented Dec 14, 2018

Add eager mode gradients for ops. #108

Add eager mode gradients for ops. #108

Comments

nsthorat commented Apr 5, 2018 • edited by tafsiri

nsthorat commented Apr 5, 2018

nsthorat commented Apr 5, 2018

nsthorat commented Apr 5, 2018

nsthorat commented Apr 5, 2018

nsthorat commented Apr 5, 2018

nsthorat commented Apr 5, 2018

nsthorat commented Apr 5, 2018

nsthorat commented Apr 5, 2018

nsthorat commented Apr 5, 2018

nsthorat commented Apr 5, 2018

tafsiri commented Apr 13, 2018

jgartman commented Apr 14, 2018

jgartman commented Apr 15, 2018

jgartman commented Apr 26, 2018

nsthorat commented Oct 24, 2018

generic-github-user commented Dec 14, 2018

nsthorat commented Apr 5, 2018 •

edited by tafsiri