Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MultivariateNormalDiag probability gradient fails #10766

Closed
ktarplee opened this issue Jun 16, 2017 · 10 comments
Closed

MultivariateNormalDiag probability gradient fails #10766

ktarplee opened this issue Jun 16, 2017 · 10 comments
Assignees
Labels
stat:awaiting tensorflower Status - Awaiting response from tensorflower type:bug Bug

Comments

@ktarplee
Copy link

System information

  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow): yes
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): MacOS 10.12.5
  • TensorFlow installed from (source or binary): source
  • TensorFlow version (use command below): v1.2.0-rc2-0-gce1d6ec49 1.2.0-rc2
  • Bazel version (if compiling from source): 0.5.1-homebrew
  • CUDA/cuDNN version: N/A
  • GPU model and memory: N/A
  • Exact command to reproduce: python test_diag.py

Describe the problem

The gradients for the tensorflow.contrib.distributions.MultiVariateNormalDiag throw an error. This might be related to #10149. Notice that the tensorflow.contrib.distributions.Normal works as expected so I think this is a bug in the gradient implementation of the MultiVarateNormalDiag.

Source code

import numpy as np
import tensorflow as tf
import tensorflow.contrib.distributions as td

assert_args = dict(validate_args=True, allow_nan_stats=False)

# tf Graph Input
x = tf.placeholder(shape=(4, 6), dtype=tf.float32, name="X")
y = tf.placeholder(shape=(4, 2), dtype=tf.float32, name="Y")

# Set model weights
b = tf.Variable(np.ones((4, 6)), dtype=np.float32, name="b")
W = tf.Variable(np.ones((6, 6)), dtype=np.float32, name="W")

pred_mu = tf.reshape(x@W + b, (4, 3, 2))
pred_sigma = tf.reshape(tf.exp(x@W + b), (4, 3, 2))

# fails below
dist = td.MultivariateNormalDiag(loc=pred_mu, scale_diag=pred_sigma, **assert_args)
prob = dist.prob(y[:, tf.newaxis])

# this works fine
# dist_x = td.Normal(loc=pred_mu[:, :, 0], scale=pred_sigma[:, :, 0], **assert_args)
# dist_y = td.Normal(loc=pred_mu[:, :, 1], scale=pred_sigma[:, :, 1], **assert_args)
# prob = dist_x.prob(y[:, 0, tf.newaxis]) * dist_y.prob(y[:, 1, tf.newaxis])

gradients = tf.gradients(prob, [b, W])

feed_dict = {
    x: np.ones((4, 6), dtype=np.float32),
    y: np.ones((4, 2), dtype=np.float32)
}

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    sess.run(prob, feed_dict=feed_dict)  # works fine
    retval = sess.run(gradients, feed_dict=feed_dict)  # throws
    print(retval)

Output

Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1139, in _do_call
    return fn(*args)
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1121, in _run_fn
    status, run_metadata)
  File "/usr/local/Cellar/python3/3.6.1/Frameworks/Python.framework/Versions/3.6/lib/python3.6/contextlib.py", line 89, in __exit__
    next(self.gen)
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py", line 466, in raise_exception_on_not_ok_status
    pywrap_tensorflow.TF_GetCode(status))
tensorflow.python.framework.errors_impl.InvalidArgumentError: indices[0] = -1 is not in [0, 3)
	 [[Node: gradients/MultivariateNormalDiag_3/prob/Prod_grad/Gather = Gather[Tindices=DT_INT32, Tparams=DT_INT32, validate_indices=true, _device="/job:localhost/replica:0/task:0/cpu:0"](gradients/MultivariateNormalDiag_3/prob/Prod_grad/Shape, gradients/MultivariateNormalDiag_3/prob/Prod_grad/Reshape)]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "test_diag.py", line 41, in <module>
    retval = sess.run(gradients, feed_dict=feed_dict) # throws
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 789, in run
    run_metadata_ptr)
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 997, in _run
    feed_dict_string, options, run_metadata)
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1132, in _do_run
    target_list, options, run_metadata)
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1152, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: indices[0] = -1 is not in [0, 3)
	 [[Node: gradients/MultivariateNormalDiag_3/prob/Prod_grad/Gather = Gather[Tindices=DT_INT32, Tparams=DT_INT32, validate_indices=true, _device="/job:localhost/replica:0/task:0/cpu:0"](gradients/MultivariateNormalDiag_3/prob/Prod_grad/Shape, gradients/MultivariateNormalDiag_3/prob/Prod_grad/Reshape)]]

Caused by op 'gradients/MultivariateNormalDiag_3/prob/Prod_grad/Gather', defined at:
  File "test_diag.py", line 31, in <module>
    gradients = tf.gradients(prob, [b, W])
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/ops/gradients_impl.py", line 540, in gradients
    grad_scope, op, func_call, lambda: grad_fn(op, *out_grads))
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/ops/gradients_impl.py", line 346, in _MaybeCompile
    return grad_fn()  # Exit early
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/ops/gradients_impl.py", line 540, in <lambda>
    grad_scope, op, func_call, lambda: grad_fn(op, *out_grads))
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/ops/math_grad.py", line 129, in _ProdGrad
    reduced_num = math_ops.reduce_prod(array_ops.gather(input_shape, reduced))
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/ops/gen_array_ops.py", line 1179, in gather
    validate_indices=validate_indices, name=name)
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 767, in apply_op
    op_def=op_def)
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 2506, in create_op
    original_op=self._default_original_op, op_def=op_def)
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1269, in __init__
    self._traceback = _extract_stack()

...which was originally created as op 'MultivariateNormalDiag_3/prob/Prod', defined at:
  File "test_diag.py", line 24, in <module>
    prob = dist.prob(y[:,tf.newaxis])
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/ops/distributions/distribution.py", line 712, in prob
    return self._call_prob(value, name)
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/ops/distributions/distribution.py", line 694, in _call_prob
    return self._prob(value, **kwargs)
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/ops/distributions/util.py", line 688, in _fn
    return fn(*args, **kwargs)
  File "/usr/local/lib/python3.6/site-packages/tensorflow/contrib/distributions/python/ops/mvn_linear_operator.py", line 216, in _prob
    return super(MultivariateNormalLinearOperator, self)._prob(x)
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/ops/distributions/transformed_distribution.py", line 406, in _prob
    prob = math_ops.reduce_prod(prob, self._reduce_event_indices)
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/ops/math_ops.py", line 1392, in reduce_prod
    name=name)
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/ops/gen_math_ops.py", line 1488, in _prod
    keep_dims=keep_dims, name=name)
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 767, in apply_op
    op_def=op_def)

InvalidArgumentError (see above for traceback): indices[0] = -1 is not in [0, 3)
	 [[Node: gradients/MultivariateNormalDiag_3/prob/Prod_grad/Gather = Gather[Tindices=DT_INT32, Tparams=DT_INT32, validate_indices=true, _device="/job:localhost/replica:0/task:0/cpu:0"](gradients/MultivariateNormalDiag_3/prob/Prod_grad/Shape, gradients/MultivariateNormalDiag_3/prob/Prod_grad/Reshape)]]
@ktarplee
Copy link
Author

ktarplee commented Jun 16, 2017

Also fails on the released version of tensorflow (v1.2.0) built from source.

@aselle aselle added stat:awaiting tensorflower Status - Awaiting response from tensorflower type:bug Bug labels Jun 16, 2017
@jakedailey1
Copy link

Also fails when gradients() calculated for loss w.r.t. weights drawn from MultivariateNormalDiag() (in my case with the help of contrib.bayesflow.stochastic_tensor.StochasticTensor). Using Normal() appears to be a (slower) workaround.

@ktarplee
Copy link
Author

I had a similar problem as @jakedailey1 which is why I distilled this example down from a Mixture Density Network with a likelihood-based loss function.

@langmore
Copy link
Contributor

This turns out to be an issue with tf.reduce_prod, whereby gradients fail upon reduction over negative indices. I'll get an internal fix started.

@RFHO-BDSS
Copy link

Wow! Just when I thought I was stuck I find there's a solution already on the way. Nice work :D
At least it looks like the same issue with gradient back propagation, as here on paste bin

How is it coming along?

@langmore
Copy link
Contributor

langmore commented Jul 1, 2017

It looks like a fix has been put together here. I can't try this out right now, but perhaps you can just grab the latest files and it will work.

@ktarplee
Copy link
Author

ktarplee commented Jul 2, 2017

I can confirm that ea79ba4 (mentioned above) fixes the issue. I cherry-picked it to v1.2.1 to verify. The fix is available on master.

@ktarplee ktarplee closed this as completed Jul 2, 2017
@pascalxia
Copy link

pascalxia commented Jul 11, 2017

I upgraded my Tensorflow to v1.2.1. But I am still facing the same error message when I am trying to train a model to learn an appropriate sigma value for the Gaussian filter. Am I doing something wrong? My system is Win10. I installed Tensorflow through Anaonda. Here is my code.

import tensorflow as tf
import tensorflow.contrib.distributions as ds
import numpy as np
import scipy.ndimage.filters as filters


#set parameters-------------------
IMAGE_SIZE = (576, 1024)
KERNEL_SIZE = 3

#build graph--------------
image = tf.placeholder(tf.float32, shape=(None,)+IMAGE_SIZE+(1,))
label = tf.placeholder(tf.float32, shape=(None,)+IMAGE_SIZE+(1,))
sigma = tf.Variable(tf.ones(1))

xinds, yinds = np.unravel_index(range(KERNEL_SIZE*KERNEL_SIZE),                                 (KERNEL_SIZE, KERNEL_SIZE))
inds = (np.column_stack((xinds,yinds))-
                    [(KERNEL_SIZE-1)/2, (KERNEL_SIZE-1)/2]).astype(np.float32)
inds = tf.constant(inds)
loc = tf.zeros(2)
scale_diag = tf.multiply(sigma, tf.ones([2,]))

mvn = ds.MultivariateNormalDiag(
    loc=loc,
    scale_diag=scale_diag)

kernel = mvn.prob(inds)
kernel = tf.reshape(kernel, (KERNEL_SIZE, KERNEL_SIZE, 1, 1))


x = tf.nn.conv2d(image, kernel, strides=[1,1,1,1], padding='SAME')
loss = tf.nn.l2_loss(tf.subtract(x, label))

train_op = tf.train.AdamOptimizer().minimize(loss)

sess = tf.Session()
sess.run(tf.global_variables_initializer())
#start training-------------------------
stimulus = np.zeros(IMAGE_SIZE)
stimulus[300, 500] = 1
test_sigma = (30,30)
filtered = filters.gaussian_filter(stimulus, test_sigma)
feed_dict = {image: np.expand_dims(np.expand_dims(stimulus, 0), 3), 
                 label: np.expand_dims(np.expand_dims(filtered, 0), 3)}
sess.run(train_op, feed_dict=feed_dict)

Here are the error messages.

InvalidArgumentError: indices[0] = -1 is not in [0, 2)
	 [[Node: gradients/MultivariateNormalDiag_3/prob/Prod_grad/Gather = Gather[Tindices=DT_INT32, Tparams=DT_INT32, validate_indices=true, _device="/job:localhost/replica:0/task:0/cpu:0"](gradients/MultivariateNormalDiag_3/prob/Prod_grad/Shape, gradients/MultivariateNormalDiag_3/prob/Prod_grad/Reshape)]]

Caused by op 'gradients/MultivariateNormalDiag_3/prob/Prod_grad/Gather', defined at:
  File "C:\Users\pasca\Anaconda3\envs\tensorflow\lib\site-packages\spyder\utils\ipython\start_kernel.py", line 231, in <module>
    main()
  File "C:\Users\pasca\Anaconda3\envs\tensorflow\lib\site-packages\spyder\utils\ipython\start_kernel.py", line 227, in main
    kernel.start()
  File "C:\Users\pasca\Anaconda3\envs\tensorflow\lib\site-packages\ipykernel\kernelapp.py", line 477, in start
    ioloop.IOLoop.instance().start()
  File "C:\Users\pasca\Anaconda3\envs\tensorflow\lib\site-packages\zmq\eventloop\ioloop.py", line 177, in start
    super(ZMQIOLoop, self).start()
  File "C:\Users\pasca\Anaconda3\envs\tensorflow\lib\site-packages\tornado\ioloop.py", line 832, in start
    self._run_callback(self._callbacks.popleft())
  File "C:\Users\pasca\Anaconda3\envs\tensorflow\lib\site-packages\tornado\ioloop.py", line 605, in _run_callback
    ret = callback()
  File "C:\Users\pasca\Anaconda3\envs\tensorflow\lib\site-packages\tornado\stack_context.py", line 277, in null_wrapper
    return fn(*args, **kwargs)
  File "C:\Users\pasca\Anaconda3\envs\tensorflow\lib\site-packages\ipykernel\kernelbase.py", line 265, in enter_eventloop
    self.eventloop(self)
  File "C:\Users\pasca\Anaconda3\envs\tensorflow\lib\site-packages\ipykernel\eventloops.py", line 106, in loop_qt5
    return loop_qt4(kernel)
  File "C:\Users\pasca\Anaconda3\envs\tensorflow\lib\site-packages\ipykernel\eventloops.py", line 99, in loop_qt4
    _loop_qt(kernel.app)
  File "C:\Users\pasca\Anaconda3\envs\tensorflow\lib\site-packages\ipykernel\eventloops.py", line 83, in _loop_qt
    app.exec_()
  File "C:\Users\pasca\Anaconda3\envs\tensorflow\lib\site-packages\ipykernel\eventloops.py", line 39, in process_stream_events
    kernel.do_one_iteration()
  File "C:\Users\pasca\Anaconda3\envs\tensorflow\lib\site-packages\ipykernel\kernelbase.py", line 298, in do_one_iteration
    stream.flush(zmq.POLLIN, 1)
  File "C:\Users\pasca\Anaconda3\envs\tensorflow\lib\site-packages\zmq\eventloop\zmqstream.py", line 352, in flush
    self._handle_recv()
  File "C:\Users\pasca\Anaconda3\envs\tensorflow\lib\site-packages\zmq\eventloop\zmqstream.py", line 472, in _handle_recv
    self._run_callback(callback, msg)
  File "C:\Users\pasca\Anaconda3\envs\tensorflow\lib\site-packages\zmq\eventloop\zmqstream.py", line 414, in _run_callback
    callback(*args, **kwargs)
  File "C:\Users\pasca\Anaconda3\envs\tensorflow\lib\site-packages\tornado\stack_context.py", line 277, in null_wrapper
    return fn(*args, **kwargs)
  File "C:\Users\pasca\Anaconda3\envs\tensorflow\lib\site-packages\ipykernel\kernelbase.py", line 283, in dispatcher
    return self.dispatch_shell(stream, msg)
  File "C:\Users\pasca\Anaconda3\envs\tensorflow\lib\site-packages\ipykernel\kernelbase.py", line 235, in dispatch_shell
    handler(stream, idents, msg)
  File "C:\Users\pasca\Anaconda3\envs\tensorflow\lib\site-packages\ipykernel\kernelbase.py", line 399, in execute_request
    user_expressions, allow_stdin)
  File "C:\Users\pasca\Anaconda3\envs\tensorflow\lib\site-packages\ipykernel\ipkernel.py", line 196, in do_execute
    res = shell.run_cell(code, store_history=store_history, silent=silent)
  File "C:\Users\pasca\Anaconda3\envs\tensorflow\lib\site-packages\ipykernel\zmqshell.py", line 533, in run_cell
    return super(ZMQInteractiveShell, self).run_cell(*args, **kwargs)
  File "C:\Users\pasca\Anaconda3\envs\tensorflow\lib\site-packages\IPython\core\interactiveshell.py", line 2698, in run_cell
    interactivity=interactivity, compiler=compiler, result=result)
  File "C:\Users\pasca\Anaconda3\envs\tensorflow\lib\site-packages\IPython\core\interactiveshell.py", line 2802, in run_ast_nodes
    if self.run_code(code, result):
  File "C:\Users\pasca\Anaconda3\envs\tensorflow\lib\site-packages\IPython\core\interactiveshell.py", line 2862, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-7-de90f1cd6d46>", line 40, in <module>
    train_op = tf.train.AdamOptimizer().minimize(loss)
  File "C:\Users\pasca\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\training\optimizer.py", line 315, in minimize
    grad_loss=grad_loss)
  File "C:\Users\pasca\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\training\optimizer.py", line 386, in compute_gradients
    colocate_gradients_with_ops=colocate_gradients_with_ops)
  File "C:\Users\pasca\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\ops\gradients_impl.py", line 540, in gradients
    grad_scope, op, func_call, lambda: grad_fn(op, *out_grads))
  File "C:\Users\pasca\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\ops\gradients_impl.py", line 346, in _MaybeCompile
    return grad_fn()  # Exit early
  File "C:\Users\pasca\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\ops\gradients_impl.py", line 540, in <lambda>
    grad_scope, op, func_call, lambda: grad_fn(op, *out_grads))
  File "C:\Users\pasca\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\ops\math_grad.py", line 129, in _ProdGrad
    reduced_num = math_ops.reduce_prod(array_ops.gather(input_shape, reduced))
  File "C:\Users\pasca\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\ops\gen_array_ops.py", line 1179, in gather
    validate_indices=validate_indices, name=name)
  File "C:\Users\pasca\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 767, in apply_op
    op_def=op_def)
  File "C:\Users\pasca\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\framework\ops.py", line 2506, in create_op
    original_op=self._default_original_op, op_def=op_def)
  File "C:\Users\pasca\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\framework\ops.py", line 1269, in __init__
    self._traceback = _extract_stack()

...which was originally created as op 'MultivariateNormalDiag_3/prob/Prod', defined at:
  File "C:\Users\pasca\Anaconda3\envs\tensorflow\lib\site-packages\spyder\utils\ipython\start_kernel.py", line 231, in <module>
    main()
[elided 23 identical lines from previous traceback]
  File "C:\Users\pasca\Anaconda3\envs\tensorflow\lib\site-packages\IPython\core\interactiveshell.py", line 2862, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-7-de90f1cd6d46>", line 33, in <module>
    kernel = mvn.prob(inds)
  File "C:\Users\pasca\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\ops\distributions\distribution.py", line 712, in prob
    return self._call_prob(value, name)
  File "C:\Users\pasca\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\ops\distributions\distribution.py", line 694, in _call_prob
    return self._prob(value, **kwargs)
  File "C:\Users\pasca\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\ops\distributions\util.py", line 688, in _fn
    return fn(*args, **kwargs)
  File "C:\Users\pasca\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\contrib\distributions\python\ops\mvn_linear_operator.py", line 216, in _prob
    return super(MultivariateNormalLinearOperator, self)._prob(x)
  File "C:\Users\pasca\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\ops\distributions\transformed_distribution.py", line 406, in _prob
    prob = math_ops.reduce_prod(prob, self._reduce_event_indices)
  File "C:\Users\pasca\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\ops\math_ops.py", line 1392, in reduce_prod
    name=name)
  File "C:\Users\pasca\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\ops\gen_math_ops.py", line 1488, in _prod
    keep_dims=keep_dims, name=name)
  File "C:\Users\pasca\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 767, in apply_op
    op_def=op_def)

InvalidArgumentError (see above for traceback): indices[0] = -1 is not in [0, 2)
	 [[Node: gradients/MultivariateNormalDiag_3/prob/Prod_grad/Gather = Gather[Tindices=DT_INT32, Tparams=DT_INT32, validate_indices=true, _device="/job:localhost/replica:0/task:0/cpu:0"](gradients/MultivariateNormalDiag_3/prob/Prod_grad/Shape, gradients/MultivariateNormalDiag_3/prob/Prod_grad/Reshape)]]

@ktarplee
Copy link
Author

v1.2.1 does not have the fix. The fix is on master. You need to cherry-pick the commit I mentioned above (ea79ba4) onto your v1.2.1 branch.

@pascalxia
Copy link

Oh now I see! Thank you so much @ktarplee !

jhseu pushed a commit to jhseu/tensorflow that referenced this issue Jul 18, 2017
JVillella pushed a commit to JVillella/tensorflow that referenced this issue Sep 5, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stat:awaiting tensorflower Status - Awaiting response from tensorflower type:bug Bug
Projects
None yet
Development

No branches or pull requests

7 participants