Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot calculate tf.gradients wrt embedding_matrix #23033

Closed
yifannieudem opened this issue Oct 17, 2018 · 1 comment
Closed

Cannot calculate tf.gradients wrt embedding_matrix #23033

yifannieudem opened this issue Oct 17, 2018 · 1 comment
Assignees

Comments

@yifannieudem
Copy link

Please go to Stack Overflow for help and support:

https://stackoverflow.com/questions/tagged/tensorflow

If you open a GitHub issue, here is our policy:

  1. It must be a bug, a feature request, or a significant problem with documentation (for small docs fixes please send a PR instead).
  2. The form below must be filled out.
  3. It shouldn't be a TensorBoard issue. Those go here.

Here's why we have that policy: TensorFlow developers respond to issues. We want to focus on work that benefits the whole community, e.g., fixing bugs and adding features. Support only helps individuals. GitHub also notifies thousands of people when issues are filed. We want them to see you communicating an interesting problem, rather than being redirected to Stack Overflow.


System information

  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow): yes
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Debian 8.1
  • Mobile device (e.g. iPhone 8, Pixel 2, Samsung Galaxy) if the issue happens on mobile device:
  • TensorFlow installed from (source or binary): binary
  • TensorFlow version (use command below): v1.3.0-rc2-20-g0787eee 1.3.0
  • Python version: 3.6.3
  • Bazel version (if compiling from source):
  • GCC/Compiler version (if compiling from source):
  • CUDA/cuDNN version: 8/6.0.21
  • GPU model and memory: Titan Xp
  • Exact command to reproduce:

You can collect some of this information using our environment capture script:

https://github.com/tensorflow/tensorflow/tree/master/tools/tf_env_collect.sh

You can obtain the TensorFlow version with:

python -c "import tensorflow as tf; print(tf.GIT_VERSION, tf.VERSION)"

Describe the problem

Describe the problem clearly here. Be sure to convey here why it's a bug in TensorFlow or a feature request.
I used tensorflow to implement an end-to-end lambdaRank retrieval model. There are 3 modules: rep_module, inter_module, and L2R_module. The embeddings (emb_mat) was defined in L2R_module, and pass as params to rep_module and inter_module: So the aggregating L2R module will have:

self.emb_mat = tf.get_variable("emb_mat",
shape=[self.vocab_size, self.emb_dim], dtype=tf.float32)
self.rep_mod = RepModule(...., emb_mat=self.emb_mat)
self.inter_mod = InterModule(..., emb_mat=self.emb_mat)
The goal is to let the emb_mat shared by both rep and inter modules, and learn it jointly with those 2 modules. This L2R module will output a batch of scores: score=(Batch_size, 1)

Then I have a another higher-level lambdaRank module to calculate the gradients by hand (I cannot use built-in off the self optimizer, since I have to get the grads and multiply that with the things in lambda rank). I have a _jacobian(y, x) function as follows:

def _jacobian(self, y_flat, x):
"""
#675
for ranknet and lambdarank
"""

loop_vars = [
    tf.constant(0, tf.int32),
    tf.TensorArray(tf.float32, size=self.batch_size),
]

_, jacobian = tf.while_loop(
    lambda j, _: j < self.batch_size,
    lambda j, result: (j + 1, result.write(j, tf.gradients(y_flat[j], x))),
    loop_vars)

return jacobian.stack()  

which will calculate the grad of each element of y wrt x, and reassemble them together. I can get Jacobians of all score wrt all other variables(model parameters, if I use a fixed embedding, so emb_mat is no longer in trainable_variables()) except the emb_mat. My other variables are like tf.layers.. variables like

tf.Variable 'conv1/conv1d/kernel:0' shape=(3, 300, 256) dtype=float32_ref,
but it cannot calculate the gradients wrt the emb_mat. It returned something like : TypeError: Failed to convert object of type <class 'list'> to Tensor. Contents: [<tensorflow.python.framework.ops.IndexedSlices object at 0x7f025e161f28>] and TypeError: Expected binary or unicode string, got <tensorflow.python.framework.ops.IndexedSlices object at 0x7f025e161f28>
The whole traceback is attached in logs section

Source code / logs

Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached. Try to provide a reproducible test case that is the bare minimum necessary to generate the problem.
Class L2R_Model(object):
.....
self.emb_mat = tf.get_variable("emb_mat",
shape=[self.vocab_size, self.emb_dim], dtype=tf.float32,
initializer=tf.orthogonal_initializer(1.0))
self.rep_mod = RepModule(...., emb_mat=self.emb_mat)
self.inter_mod = InterModule(..., emb_mat=self.emb_mat)

def _jacobian(self, y_flat, x):
"""
#675
for ranknet and lambdarank
"""

    loop_vars = [
        tf.constant(0, tf.int32),
        tf.TensorArray(tf.float32, size=self.batch_size),
    ]

    _, jacobian = tf.while_loop(
        lambda j, _: j < self.batch_size,
        lambda j, result: (j + 1, result.write(j, tf.gradients(y_flat[j], x))),
        loop_vars)

    return jacobian.stack()

Traceback (most recent call last):
File "/u/nieyifan/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/tensor_util.py", line 460, in make_tensor_proto
str_values = [compat.as_bytes(x) for x in proto_values]
File "/u/nieyifan/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/tensor_util.py", line 460, in
str_values = [compat.as_bytes(x) for x in proto_values]
File "/u/nieyifan/anaconda3/lib/python3.6/site-packages/tensorflow/python/util/compat.py", line 65, in as_bytes
(bytes_or_text,))
TypeError: Expected binary or unicode string, got <tensorflow.python.framework.ops.IndexedSlices object at 0x7f025e161f28>

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "main_lambda.py", line 234, in
debug()
File "main_lambda.py", line 230, in debug
inter_param_dict=inter_param_dict, resume=r_flag)
File "/u/nieyifan/projects/L2R_RM/L2R_LambdaRank.py", line 25, in init
self.loss, self.num_pairs, self.score, self.train_op = self._build_model()
File "/u/nieyifan/projects/L2R_RM/L2R_LambdaRank.py", line 215, in _build_model
grads = [self._get_derivative(score, Wk, lambda_ij) for Wk in vars]
File "/u/nieyifan/projects/L2R_RM/L2R_LambdaRank.py", line 215, in
grads = [self._get_derivative(score, Wk, lambda_ij) for Wk in vars]
File "/u/nieyifan/projects/L2R_RM/L2R_LambdaRank.py", line 112, in _get_derivative
dsi_dWk = self._jacobian(score, Wk) # (BS, )
File "/u/nieyifan/projects/L2R_RM/L2R_LambdaRank.py", line 98, in _jacobian
loop_vars)
File "/u/nieyifan/anaconda3/lib/python3.6/site-packages/tensorflow/python/ops/control_flow_ops.py", line 2775, in while_loop
result = context.BuildLoop(cond, body, loop_vars, shape_invariants)
File "/u/nieyifan/anaconda3/lib/python3.6/site-packages/tensorflow/python/ops/control_flow_ops.py", line 2604, in BuildLoop
pred, body, original_loop_vars, loop_vars, shape_invariants)
File "/u/nieyifan/anaconda3/lib/python3.6/site-packages/tensorflow/python/ops/control_flow_ops.py", line 2554, in _BuildLoop
body_result = body(*packed_vars_for_body)
File "/u/nieyifan/projects/L2R_RM/L2R_LambdaRank.py", line 97, in
lambda j, result: (j + 1, result.write(j, tf.gradients(y_flat[j], x))),
File "/u/nieyifan/anaconda3/lib/python3.6/site-packages/tensorflow/python/util/tf_should_use.py", line 175, in wrapped
return _add_should_use_warning(fn(*args, **kwargs))
File "/u/nieyifan/anaconda3/lib/python3.6/site-packages/tensorflow/python/ops/tensor_array_ops.py", line 302, in write
value = ops.convert_to_tensor(value, name="value")
File "/u/nieyifan/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 611, in convert_to_tensor
as_ref=False)
File "/u/nieyifan/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 676, in internal_convert_to_tensor
ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
File "/u/nieyifan/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/constant_op.py", line 121, in _constant_tensor_conversion_function
return constant(v, dtype=dtype, name=name)
File "/u/nieyifan/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/constant_op.py", line 102, in constant
tensor_util.make_tensor_proto(value, dtype=dtype, shape=shape, verify_shape=verify_shape))
File "/u/nieyifan/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/tensor_util.py", line 464, in make_tensor_proto
"supported type." % (type(values), values))
TypeError: Failed to convert object of type <class 'list'> to Tensor. Contents: [<tensorflow.python.framework.ops.IndexedSlices object at 0x7f025e161f28>]. Consider casting elements to a supported type.

@wt-huang
Copy link

@yifannieudem You can probably use tf.gradients(loss, embeddings) for your case which will give a tf.IndexedSlices object corresponding to the gradients of embeddings. You can also use optimizer.apply_gradients to aggregate gradients for repeating word units.

Manually calculation using jacobian is another way of doing it, make sure that all the types match.

@wt-huang wt-huang closed this as completed Nov 2, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants