input and output of @tf.custom_gradient #21756

huangbiubiu · 2018-08-21T10:06:12Z

System information

Have I written custom code (as opposed to using a stock example script provided in TensorFlow): Yes
OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 16.04
Mobile device (e.g. iPhone 8, Pixel 2, Samsung Galaxy) if the issue happens on mobile device: N/A
TensorFlow installed from (source or binary): pip
TensorFlow version (use command below): TensorFlow 1.10
Python version: Python 3.6.5 by Anaconda
Bazel version (if compiling from source): N/A
GCC/Compiler version (if compiling from source): N/A
CUDA/cuDNN version: CUDA 9.0/ cuDNN 7.1
GPU model and memory: NVIDIA GeForce GTX 1080Ti 11G
Exact command to reproduce: N/A

Describe the problem

I am confusing about the input and output of tf.custom_gradient.

Input

In doc, it says:

x is a Tensor or sequence of Tensor inputs to the function. But with multiple inputs, instead of taking a sequence of Tensors, function f takes N positional arguments. I think this is a mistake in documentation. A sequence of Tensors can't be passed to f which can be reproduced by code below:

def self_define_op_multiple_inputs():
    @tf.custom_gradient
    def loss_func(input_):
        x = input_[0]
        label = input_[2]

        def grad(dy):
            return [dy, dy]

        return x - label, grad

    x = tf.range(10, dtype=tf.float32)
    y = tf.range(10, dtype=tf.int32)

    loss = loss_func([x, y])


if __name__ == '__main__':
    self_define_op_multiple_inputs()

It will try to convert [x, y] to a single Tensor and raises a error:

/home/hyh/anaconda3/envs/tensorflow/lib/python3.6/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters
Traceback (most recent call last):
  File "/home/hyh/projects/benchmark/test.py", line 280, in <module>
    self_define_op_multiple_inputs()
  File "/home/hyh/projects/benchmark/test.py", line 276, in self_define_op_multiple_inputs
    loss = loss_func([x, y])
  File "/home/hyh/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/ops/custom_gradient.py", line 111, in decorated
    return _graph_mode_decorator(f, *args, **kwargs)
  File "/home/hyh/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/ops/custom_gradient.py", line 124, in _graph_mode_decorator
    args = [ops.convert_to_tensor(x) for x in args]
  File "/home/hyh/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/ops/custom_gradient.py", line 124, in <listcomp>
    args = [ops.convert_to_tensor(x) for x in args]
  File "/home/hyh/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 998, in convert_to_tensor
    as_ref=False)
  File "/home/hyh/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1094, in internal_convert_to_tensor
    ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
  File "/home/hyh/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/ops/array_ops.py", line 961, in _autopacking_conversion_function
    return _autopacking_helper(v, inferred_dtype, name or "packed")
  File "/home/hyh/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/ops/array_ops.py", line 903, in _autopacking_helper
    elem))
TypeError: Cannot convert a list containing a tensor of dtype <dtype: 'int32'> to <dtype: 'float32'> (Tensor is: <tf.Tensor 'range_1:0' shape=(10,) dtype=int32>)

While change to positional arguments can fix the bug:

@tf.custom_gradient
    def loss_func(x, label):

        def grad(dy):
            return [dy, dy]

Related discussion can be found at https://stackoverflow.com/questions/51836242/tf-custom-gradient-with-multiple-inputs.

Output

This is the problem about the output of grad_fn.
In doc, grad_vars is a list<Tensor> with the derivatives of Tensors in y with respect to the variables, and signature is g(*grad_ys, variables=None).

Is variables is original variables or the gradient of variables like grad_ys?
Return grad_vars as a list<Tensor> will raise an error:

def self_define_op_multiple_inputs():
    @tf.custom_gradient
    def loss_func(x):
        w = tf.get_variable("margin_inner_product_layer/W", shape=(1,), dtype=tf.float32,
                            initializer=tf.constant_initializer([10]), use_resource=True)

        def grad(dy, variables=None):
            return dy, [variables]  # just for testing

        return tf.multiply(x, w), grad

    x = tf.constant([5], dtype=tf.float32, shape=(1,))

    loss = loss_func(x)
    dl = tf.gradients(loss, x)

    with tf.Session(config=config) as sess:
        derivative = sess.run(dl)
        print(derivative)


if __name__ == '__main__':
    self_define_op_multiple_inputs()

It seems like it handles grad_vars as a Tensor:

/home/hyh/anaconda3/envs/tensorflow/lib/python3.6/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters
Traceback (most recent call last):
  File "/home/hyh/projects/benchmark/test.py", line 259, in <module>
    self_define_op_multiple_inputs()
  File "/home/hyh/projects/benchmark/test.py", line 251, in self_define_op_multiple_inputs
    dl = tf.gradients(loss, x)
  File "/home/hyh/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/ops/gradients_impl.py", line 596, in gradients
    gate_gradients, aggregation_method, stop_gradients)
  File "/home/hyh/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/ops/gradients_impl.py", line 795, in _GradientsHelper
    _LogOpGradients(op, out_grads, in_grads)
  File "/home/hyh/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/ops/gradients_impl.py", line 945, in _LogOpGradients
    ", ".join([x.name for x in in_grads if _FilterGrad(x)]))
  File "/home/hyh/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/ops/gradients_impl.py", line 945, in <listcomp>
    ", ".join([x.name for x in in_grads if _FilterGrad(x)]))
AttributeError: 'list' object has no attribute 'name'

Change grad_vars to Tensor doesn't work either:

def self_define_op_multiple_inputs():
    @tf.custom_gradient
    def loss_func(x):
        w = tf.get_variable("margin_inner_product_layer/W", shape=(1,), dtype=tf.float32,
                            initializer=tf.constant_initializer([10]), use_resource=True)

        def grad(dy, variables=None):
            return dy, variables  # just for testing

        return tf.multiply(x, w), grad

    x = tf.constant([5], dtype=tf.float32, shape=(1,))

    loss = loss_func(x)
    dl = tf.gradients(loss, x)

    with tf.Session(config=config) as sess:
        derivative = sess.run(dl)
        print(derivative)


if __name__ == '__main__':
    self_define_op_multiple_inputs()

The text was updated successfully, but these errors were encountered:

cy89 · 2018-08-31T22:12:53Z

@andydavis1 would you PTAL, or reassign to someone who knows the custom gradients code?

asimshankar · 2018-08-31T23:40:04Z

The documentation could be improved here. Saying "x can be a list of Tensors" is confusing. What we really wanted to convey was that f can be a function with multiple arguments, not just a single Tensor.

CC @alextp

@DSRYhh - do you have suggestions for better phrasing?

alextp · 2018-08-31T23:43:20Z

I'm preparing a PR which removes "list" from the documentation, fixing the issue you saw there.

In your last example the correct way to do this is

@tf.custom_gradient
def loss_func(x):
    w = tf.get_variable("margin_inner_product_layer/W", shape=(1,), dtype=tf.float32,
                        initializer=tf.constant_initializer([10]), use_resource=True)

    def grad(dy, variables=None):
        return dy, [dy for v in variables]

    return tf.multiply(x, w), grad

as in, the second return value when variables is not None should be a list with one element per variable in variables. I'll clarify the documentation there too.

huangbiubiu · 2018-09-11T10:49:43Z

@alextp To be more clear, for the second parameter (and the second return value), grad_fn accepts original variables (not the gradient of variables) and return the gradient of variables, is that correct?

If that's correct, why not grad_fn accepts gradient of variables instead of original variables ( in order to be consistent with grad_ys (the first parameter))? In that case, we can use the derivates of variables by automatic differentiation instead of writing the derivates of variables manually.

alextp · 2018-09-11T14:37:36Z

grad_ys is the "downstream" gradient of the outputs of your function; since the variables are not outputs there is no gradient already computed wrt them. If you want you can call tf.gradients or use the tf.GradientTape yourself to compute the gradient wrt the variables to then modify it, but we don't force you to do that since it would waste computation in eager execution.

…

On Tue, Sep 11, 2018 at 3:56 AM Huang Yuheng ***@***.***> wrote: @alextp <https://github.com/alextp> To be more clear, for the second parameter (and the second return value), grad_fn accepts *original* variables (not the *gradient* of variables) and return the *gradient* of variables, is that correct? If that's correct, why not grad_fn accepts *gradient* of variables instead of *original* variables ( in order to be consistent with grad_ys (the first parameter))? In that case, we can use the derivates of variables by automatic differentiation instead of writing the derivates of variables manually. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#21756 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAATxetzCj348nk4KLUjm3FAJNNXJqR3ks5uZ5b1gaJpZM4WFeuE> .

-- - Alex

tensorflowbutler assigned cy89 Aug 21, 2018

cy89 assigned andydavis1 Aug 31, 2018

andydavis1 assigned asimshankar and unassigned andydavis1 Aug 31, 2018

asimshankar assigned alextp Aug 31, 2018

tensorflow-copybara closed this as completed in a3ef081 Sep 1, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

input and output of @tf.custom_gradient #21756

input and output of @tf.custom_gradient #21756

huangbiubiu commented Aug 21, 2018 •

edited

cy89 commented Aug 31, 2018

asimshankar commented Aug 31, 2018

alextp commented Aug 31, 2018

huangbiubiu commented Sep 11, 2018

alextp commented Sep 11, 2018 via email

input and output of @tf.custom_gradient #21756

input and output of @tf.custom_gradient #21756

Comments

huangbiubiu commented Aug 21, 2018 • edited

System information

Describe the problem

Input

Output

cy89 commented Aug 31, 2018

asimshankar commented Aug 31, 2018

alextp commented Aug 31, 2018

huangbiubiu commented Sep 11, 2018

alextp commented Sep 11, 2018 via email

huangbiubiu commented Aug 21, 2018 •

edited