-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add eager mode gradients for ops. #108
Comments
From @israelvicars on January 19, 2018 17:20 I'll start |
From @manrajgrover on January 28, 2018 10:27 @nsthorat @dsmilkov A little guidance required here. What should the gradient of comparison and logical operations look like? Any good sources for the same? |
From @manrajgrover on February 9, 2018 13:57 @nsthorat We can tick off
Since logical and comparison operations don't have an actual gradient, should we return Also, looks like unary and binary operations are now almost covered. Would be needing pointers on next set to focus on. |
From @dsmilkov on February 9, 2018 22:51 @manrajgrover Great q. For now, let's leave out logical and comparison ops and not pass a gradient function to |
From @dsmilkov on February 19, 2018 3:30 I'll be taking reverse, slice, pad and concat (already started work). Thanks! |
From @easadler on April 5, 2018 1:14 Seems like this may be a little out of date, but I'd like to help if you still would like help. I think |
A high priority one would be batchNorm (which you'd return gradients for all of the parameters - this should be pretty straightforward) or resizeBilinear, if you're willing to take it on! resizeBilinear has a big filed here: #38 |
From @easadler on April 5, 2018 14:53 I will give |
I will work on |
I can finish up |
I'll finish |
I'll try |
This PR adds the gradient for LRN. tensorflow/tfjs#108 FEATURE
Closing this out in favor of individual issues. |
Is there an updated list of which functions have gradients implemented and which do not? |
* quick test. * bump travis * Revert change.
From @nsthorat on January 18, 2018 15:56
The infrastructure for eager mode is now ready for gradient methods to be filled in!
Eager mode provides a new set of methods on
NDArrayMath
which allows the user to eagerly compute gradients. Most users will use an optimizer like this:You'll notice that there is no use of the Graph, we simply use ops on
NDArrayMath
directly inside of anoptimizer.minimize()
method.You can find a full example of training MNIST in eager mode here: https://github.com/PAIR-code/deeplearnjs/blob/master/demos/mnist_eager/model.ts
As part of
NDArrayMath
we expose several new methods. The important ones are these:math.gradients(f: () => cost, xs)
which executesf()
(which produces a scalar value) and returns the gradient of the output of f with respect to xs (which can be an NDArray or a string => NDArray map).math.valueAndGradients(f: () => cost, xs)
which is the same asmath.gradients()
but also returns the output off()
.math.vjp(f: () => y, x, dy)
which computes a vector-jacobian product - it is similar to gradients, but allowsf()
to produce a non-scalar value and lets the user provide a dy. This is useful to compute a subset of backpropagation, or to tests gradients of a single op with a provided dy (this is how we unit test).math.customGradient(f: () => {value, gradients}, xs)
which allows the user to provide a custom gradient of an arbitrary function closure instead of using the default gradients of the ops in the function. We use this for numerical stability for ops like softmaxCrossEntropy, and for mean / sum so we can compute a faster gradient (instead of the combination of gradients of the kernels they use). Most of the time, you shouldn't need to use this.Now that these methods exist and are relatively stable, we can flush out gradients for kernels and ops!
To add gradients for kernels, we simply need to add a derivative function to the
executeKernel
calls inside ofNDArrayMath
. An example:The derivative is an function that takes dy, and y, and returns an object whose keys are the inputs (as defined by the inputs argument to executeKernel) and returns a function that returns the derivative with respect to that input. These derivatives should not call executeKernel, rather call math ops directly (this is so we can compute second order gradients).
Two example PRs adding gradients:
tensorflow/tfjs-core#521
tensorflow/tfjs-core#544
Note that we have lots of gradients in the Graph layer already, we just need to move them over to the gradients defined in eager mode.
Here is the list of ops and whether the gradient has been implemented:
Copied from original issue: tensorflow/tfjs-core#561
The text was updated successfully, but these errors were encountered: