Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

input and output of @tf.custom_gradient #21756

Closed
huangbiubiu opened this issue Aug 21, 2018 · 5 comments
Closed

input and output of @tf.custom_gradient #21756

huangbiubiu opened this issue Aug 21, 2018 · 5 comments
Assignees

Comments

@huangbiubiu
Copy link

huangbiubiu commented Aug 21, 2018

System information

  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow): Yes
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 16.04
  • Mobile device (e.g. iPhone 8, Pixel 2, Samsung Galaxy) if the issue happens on mobile device: N/A
  • TensorFlow installed from (source or binary): pip
  • TensorFlow version (use command below): TensorFlow 1.10
  • Python version: Python 3.6.5 by Anaconda
  • Bazel version (if compiling from source): N/A
  • GCC/Compiler version (if compiling from source): N/A
  • CUDA/cuDNN version: CUDA 9.0/ cuDNN 7.1
  • GPU model and memory: NVIDIA GeForce GTX 1080Ti 11G
  • Exact command to reproduce: N/A

Describe the problem

I am confusing about the input and output of tf.custom_gradient.

Input

In doc, it says:

x is a Tensor or sequence of Tensor inputs to the function. But with multiple inputs, instead of taking a sequence of Tensors, function f takes N positional arguments. I think this is a mistake in documentation. A sequence of Tensors can't be passed to f which can be reproduced by code below:

def self_define_op_multiple_inputs():
    @tf.custom_gradient
    def loss_func(input_):
        x = input_[0]
        label = input_[2]

        def grad(dy):
            return [dy, dy]

        return x - label, grad

    x = tf.range(10, dtype=tf.float32)
    y = tf.range(10, dtype=tf.int32)

    loss = loss_func([x, y])


if __name__ == '__main__':
    self_define_op_multiple_inputs()

It will try to convert [x, y] to a single Tensor and raises a error:

/home/hyh/anaconda3/envs/tensorflow/lib/python3.6/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters
Traceback (most recent call last):
  File "/home/hyh/projects/benchmark/test.py", line 280, in <module>
    self_define_op_multiple_inputs()
  File "/home/hyh/projects/benchmark/test.py", line 276, in self_define_op_multiple_inputs
    loss = loss_func([x, y])
  File "/home/hyh/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/ops/custom_gradient.py", line 111, in decorated
    return _graph_mode_decorator(f, *args, **kwargs)
  File "/home/hyh/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/ops/custom_gradient.py", line 124, in _graph_mode_decorator
    args = [ops.convert_to_tensor(x) for x in args]
  File "/home/hyh/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/ops/custom_gradient.py", line 124, in <listcomp>
    args = [ops.convert_to_tensor(x) for x in args]
  File "/home/hyh/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 998, in convert_to_tensor
    as_ref=False)
  File "/home/hyh/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1094, in internal_convert_to_tensor
    ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
  File "/home/hyh/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/ops/array_ops.py", line 961, in _autopacking_conversion_function
    return _autopacking_helper(v, inferred_dtype, name or "packed")
  File "/home/hyh/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/ops/array_ops.py", line 903, in _autopacking_helper
    elem))
TypeError: Cannot convert a list containing a tensor of dtype <dtype: 'int32'> to <dtype: 'float32'> (Tensor is: <tf.Tensor 'range_1:0' shape=(10,) dtype=int32>)

While change to positional arguments can fix the bug:

@tf.custom_gradient
    def loss_func(x, label):

        def grad(dy):
            return [dy, dy]

Related discussion can be found at https://stackoverflow.com/questions/51836242/tf-custom-gradient-with-multiple-inputs.

Output

This is the problem about the output of grad_fn.
In doc, grad_vars is a list<Tensor> with the derivatives of Tensors in y with respect to the variables, and signature is g(*grad_ys, variables=None).

  1. Is variables is original variables or the gradient of variables like grad_ys?
  2. Return grad_vars as a list<Tensor> will raise an error:
def self_define_op_multiple_inputs():
    @tf.custom_gradient
    def loss_func(x):
        w = tf.get_variable("margin_inner_product_layer/W", shape=(1,), dtype=tf.float32,
                            initializer=tf.constant_initializer([10]), use_resource=True)

        def grad(dy, variables=None):
            return dy, [variables]  # just for testing

        return tf.multiply(x, w), grad

    x = tf.constant([5], dtype=tf.float32, shape=(1,))

    loss = loss_func(x)
    dl = tf.gradients(loss, x)

    with tf.Session(config=config) as sess:
        derivative = sess.run(dl)
        print(derivative)


if __name__ == '__main__':
    self_define_op_multiple_inputs()

It seems like it handles grad_vars as a Tensor:

/home/hyh/anaconda3/envs/tensorflow/lib/python3.6/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters
Traceback (most recent call last):
  File "/home/hyh/projects/benchmark/test.py", line 259, in <module>
    self_define_op_multiple_inputs()
  File "/home/hyh/projects/benchmark/test.py", line 251, in self_define_op_multiple_inputs
    dl = tf.gradients(loss, x)
  File "/home/hyh/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/ops/gradients_impl.py", line 596, in gradients
    gate_gradients, aggregation_method, stop_gradients)
  File "/home/hyh/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/ops/gradients_impl.py", line 795, in _GradientsHelper
    _LogOpGradients(op, out_grads, in_grads)
  File "/home/hyh/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/ops/gradients_impl.py", line 945, in _LogOpGradients
    ", ".join([x.name for x in in_grads if _FilterGrad(x)]))
  File "/home/hyh/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/ops/gradients_impl.py", line 945, in <listcomp>
    ", ".join([x.name for x in in_grads if _FilterGrad(x)]))
AttributeError: 'list' object has no attribute 'name'

Change grad_vars to Tensor doesn't work either:

def self_define_op_multiple_inputs():
    @tf.custom_gradient
    def loss_func(x):
        w = tf.get_variable("margin_inner_product_layer/W", shape=(1,), dtype=tf.float32,
                            initializer=tf.constant_initializer([10]), use_resource=True)

        def grad(dy, variables=None):
            return dy, variables  # just for testing

        return tf.multiply(x, w), grad

    x = tf.constant([5], dtype=tf.float32, shape=(1,))

    loss = loss_func(x)
    dl = tf.gradients(loss, x)

    with tf.Session(config=config) as sess:
        derivative = sess.run(dl)
        print(derivative)


if __name__ == '__main__':
    self_define_op_multiple_inputs()
@cy89
Copy link

cy89 commented Aug 31, 2018

@andydavis1 would you PTAL, or reassign to someone who knows the custom gradients code?

@asimshankar
Copy link
Contributor

The documentation could be improved here. Saying "x can be a list of Tensors" is confusing. What we really wanted to convey was that f can be a function with multiple arguments, not just a single Tensor.

CC @alextp

@DSRYhh - do you have suggestions for better phrasing?

@alextp
Copy link
Contributor

alextp commented Aug 31, 2018

I'm preparing a PR which removes "list" from the documentation, fixing the issue you saw there.

In your last example the correct way to do this is

@tf.custom_gradient
def loss_func(x):
    w = tf.get_variable("margin_inner_product_layer/W", shape=(1,), dtype=tf.float32,
                        initializer=tf.constant_initializer([10]), use_resource=True)

    def grad(dy, variables=None):
        return dy, [dy for v in variables]

    return tf.multiply(x, w), grad

as in, the second return value when variables is not None should be a list with one element per variable in variables. I'll clarify the documentation there too.

@huangbiubiu
Copy link
Author

@alextp To be more clear, for the second parameter (and the second return value), grad_fn accepts original variables (not the gradient of variables) and return the gradient of variables, is that correct?

If that's correct, why not grad_fn accepts gradient of variables instead of original variables ( in order to be consistent with grad_ys (the first parameter))? In that case, we can use the derivates of variables by automatic differentiation instead of writing the derivates of variables manually.

@alextp
Copy link
Contributor

alextp commented Sep 11, 2018 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants