Make tf.transpose emit simpler graph when possible #21945

efagerho · 2018-08-29T11:47:25Z

If not given an explicit 'perm' parameter, tf.transpose currently
emits a graph that dynamically calculates it from the rank of the
input tensor. This is completely unnecessary when the rank of the
input can be statically determined at graph construction time.

Modify tf.transpose to emit 'perm' as a single Const node whenever
possible.

tensorflow/python/ops/array_ops.py

If not given an explicit 'perm' parameter, tf.transpose currently emits a graph that dynamically calculates it from the rank of the input tensor. This is completely unnecessary when the rank of the input can be statically determined at graph construction time. Modify tf.transpose to emit 'perm' as a single Const node whenever possible.

efagerho · 2018-09-15T21:03:45Z

EDIT: It looks like //tensorflow/contrib/learn:dnn_test fails with the patch in addition to the other pre-existing failures under contrib. Need to debug this further, since I can't figure out the root cause.

efagerho · 2018-09-17T11:05:50Z

It looks like there are a few tests where this patch causes a test to fail. They all raise an exception in the same place:

tensorflow/contrib/learn/python/learn/estimators/head.py", line 1924, in _centered_bias_step

What's strange is that the code that builds the graph doesn't fail when tf.transpose is called, i.e. the graph node is created just as expected, so its input parameters seem validated. Having gone through every such call with some good old print debugging, the parameters don't look like anything strange. The exception in the test is raised when the optimizer is creating the backprop graph for the bias computation and at this point it looks like some variable and it's grad have different shapes:

File "/home/efagerholm/.cache/bazel/_bazel_efagerholm/3bd66cc293ffd5c1e1b6be4e441d09f4/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/contrib/learn/dnn_test.runfiles/org_tensorflow/tensorflow/contrib/learn/python/learn/estimators/dnn_test.py", line 1562, in testEnableCenteredBias
    regressor.fit(input_fn=_input_fn, steps=5)
  File "/home/efagerholm/.cache/bazel/_bazel_efagerholm/3bd66cc293ffd5c1e1b6be4e441d09f4/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/contrib/learn/dnn_test.runfiles/org_tensorflow/tensorflow/python/util/deprecation.py", line 488, in new_func
    return func(*args, **kwargs)
  File "/home/efagerholm/.cache/bazel/_bazel_efagerholm/3bd66cc293ffd5c1e1b6be4e441d09f4/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/contrib/learn/dnn_test.runfiles/org_tensorflow/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 525, in fit
    loss = self._train_model(input_fn=input_fn, hooks=hooks)
  File "/home/efagerholm/.cache/bazel/_bazel_efagerholm/3bd66cc293ffd5c1e1b6be4e441d09f4/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/contrib/learn/dnn_test.runfiles/org_tensorflow/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 1042, in _train_model
    model_fn_ops = self._get_train_ops(features, labels)
  File "/home/efagerholm/.cache/bazel/_bazel_efagerholm/3bd66cc293ffd5c1e1b6be4e441d09f4/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/contrib/learn/dnn_test.runfiles/org_tensorflow/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 1265, in _get_train_ops
    return self._call_model_fn(features, labels, model_fn_lib.ModeKeys.TRAIN)
  File "/home/efagerholm/.cache/bazel/_bazel_efagerholm/3bd66cc293ffd5c1e1b6be4e441d09f4/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/contrib/learn/dnn_test.runfiles/org_tensorflow/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 1228, in _call_model_fn
    model_fn_results = self._model_fn(features, labels, **kwargs)
  File "/home/efagerholm/.cache/bazel/_bazel_efagerholm/3bd66cc293ffd5c1e1b6be4e441d09f4/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/contrib/learn/dnn_test.runfiles/org_tensorflow/tensorflow/contrib/learn/python/learn/estimators/dnn.py", line 214, in _dnn_model_fn
    logits=logits)
  File "/home/efagerholm/.cache/bazel/_bazel_efagerholm/3bd66cc293ffd5c1e1b6be4e441d09f4/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/contrib/learn/dnn_test.runfiles/org_tensorflow/tensorflow/contrib/learn/python/learn/estimators/head.py", line 758, in create_model_fn_ops
    enable_centered_bias=self._enable_centered_bias)
  File "/home/efagerholm/.cache/bazel/_bazel_efagerholm/3bd66cc293ffd5c1e1b6be4e441d09f4/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/contrib/learn/dnn_test.runfiles/org_tensorflow/tensorflow/contrib/learn/python/learn/estimators/head.py", line 669, in _create_model_fn_ops
    batch_size, loss_fn, weight_tensor)
  File "/home/efagerholm/.cache/bazel/_bazel_efagerholm/3bd66cc293ffd5c1e1b6be4e441d09f4/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/contrib/learn/dnn_test.runfiles/org_tensorflow/tensorflow/contrib/learn/python/learn/estimators/head.py", line 1940, in _train_op
    weights=weights)
  File "/home/efagerholm/.cache/bazel/_bazel_efagerholm/3bd66cc293ffd5c1e1b6be4e441d09f4/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/contrib/learn/dnn_test.runfiles/org_tensorflow/tensorflow/contrib/learn/python/learn/estimators/head.py", line 1924, in _centered_bias_step
    centered_bias_loss, var_list=(centered_bias,), name=name)
  File "/home/efagerholm/.cache/bazel/_bazel_efagerholm/3bd66cc293ffd5c1e1b6be4e441d09f4/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/contrib/learn/dnn_test.runfiles/org_tensorflow/tensorflow/python/training/optimizer.py", line 410, in minimize
    name=name)
  File "/home/efagerholm/.cache/bazel/_bazel_efagerholm/3bd66cc293ffd5c1e1b6be4e441d09f4/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/contrib/learn/dnn_test.runfiles/org_tensorflow/tensorflow/python/training/optimizer.py", line 607, in apply_gradients
    update_ops.append(processor.update_op(self, grad))
  File "/home/efagerholm/.cache/bazel/_bazel_efagerholm/3bd66cc293ffd5c1e1b6be4e441d09f4/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/contrib/learn/dnn_test.runfiles/org_tensorflow/tensorflow/python/training/optimizer.py", line 115, in update_op
    update_op = optimizer._apply_dense(g, self._v)  # pylint: disable=protected-access
  File "/home/efagerholm/.cache/bazel/_bazel_efagerholm/3bd66cc293ffd5c1e1b6be4e441d09f4/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/contrib/learn/dnn_test.runfiles/org_tensorflow/tensorflow/python/training/adagrad.py", line 103, in _apply_dense
    use_locking=self._use_locking)
  File "/home/efagerholm/.cache/bazel/_bazel_efagerholm/3bd66cc293ffd5c1e1b6be4e441d09f4/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/contrib/learn/dnn_test.runfiles/org_tensorflow/tensorflow/python/training/gen_training_ops.py", line 174, in apply_adagrad
    use_locking=use_locking, update_slots=update_slots, name=name)
  File "/home/efagerholm/.cache/bazel/_bazel_efagerholm/3bd66cc293ffd5c1e1b6be4e441d09f4/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/contrib/learn/dnn_test.runfiles/org_tensorflow/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
    op_def=op_def)
  File "/home/efagerholm/.cache/bazel/_bazel_efagerholm/3bd66cc293ffd5c1e1b6be4e441d09f4/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/contrib/learn/dnn_test.runfiles/org_tensorflow/tensorflow/python/util/deprecation.py", line 488, in new_func
    return func(*args, **kwargs)
  File "/home/efagerholm/.cache/bazel/_bazel_efagerholm/3bd66cc293ffd5c1e1b6be4e441d09f4/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/contrib/learn/dnn_test.runfiles/org_tensorflow/tensorflow/python/framework/ops.py", line 3274, in create_op
    op_def=op_def)
  File "/home/efagerholm/.cache/bazel/_bazel_efagerholm/3bd66cc293ffd5c1e1b6be4e441d09f4/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/contrib/learn/dnn_test.runfiles/org_tensorflow/tensorflow/python/framework/ops.py", line 1770, in __init__
    self._traceback = tf_stack.extract_stack()

InvalidArgumentError (see above for traceback): var and grad do not have the same shape[1] []
         [[node dnn/regression_head/centered_bias_step/update_dnn/regression_head/centered_bias_weight/ApplyAdagrad (defined at /home/efagerholm/.cache/bazel/_bazel_efagerholm/3bd66cc293ffd5c1e1b6be4e441d09f4/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/contrib/learn/dnn_test.runfiles/org_tensorflow/tensorflow/contrib/learn/python/learn/estimators/head.py:1924)  = ApplyAdagrad[T=DT_FLOAT, _class=["loc:@dnn/r...plyAdagrad"], update_slots=true, use_locking=false, _device="/job:localhost/replica:0/task:0/device:CPU:0"](dnn/regression_head/centered_bias_weight, dnn/regression_head/dnn/regression_head/centered_bias_weight/Adagrad, dnn/regression_head/centered_bias_step/learning_rate, dnn/regression_head/gradients/dnn/regression_head/centered_bias_step/Tile_grad/Sum)]]

I don't quite understand how it would be possible to have a valid forward graph and then have the optimizer end up with different sizes for variables during backprop. I'll look closer into this later this week.

efagerho · 2018-09-17T12:45:36Z

It looks like I've triggered a bug in TensorFlow (probably the grappler optimizer). The following code fails with the error in the message above:

      a = ops.convert_to_tensor(a, name="a") 
      if not a.get_shape().ndims: 
        rank = gen_array_ops.rank(a) 
        perm = (rank - 1) - gen_math_ops._range(0, rank, 1) 
      else: 
        rank = a.get_shape().ndims 
        perm = (rank - 1) - np.arange(rank, dtype=np.int32)

However, if I simply add a tf.Print on the perm parameter it works, i.e. the following code passes unit tests:

      a = ops.convert_to_tensor(a, name="a") 
      if not a.get_shape().ndims: 
        rank = gen_array_ops.rank(a) 
        perm = (rank - 1) - gen_math_ops._range(0, rank, 1) 
      else: 
        rank = a.get_shape().ndims 
        perm = (rank - 1) - np.arange(rank, dtype=np.int32) 
        from tensorflow.python.ops import logging_ops 
        perm = logging_ops.Print(perm, [perm], "sdfsdf")

alextp · 2018-09-17T16:33:32Z

Can you fix this by doing perm = ops.convert_to_tensor((rank - 1) - np.arange(rank, dtype=np.int32))?

…

On Mon, Sep 17, 2018 at 5:53 AM Edvard Fagerholm ***@***.***> wrote: It looks like I've triggered a bug in TensorFlow (probably the grappler optimizer). The following code fails with the error in the message above: a = ops.convert_to_tensor(a, name="a") if not a.get_shape().ndims: rank = gen_array_ops.rank(a) perm = (rank - 1) - gen_math_ops._range(0, rank, 1) else: rank = a.get_shape().ndims perm = (rank - 1) - np.arange(rank, dtype=np.int32) However, if I simply add a tf.Print on the perm parameter it works, i.e. the following code passes unit tests: a = ops.convert_to_tensor(a, name="a") if not a.get_shape().ndims: rank = gen_array_ops.rank(a) perm = (rank - 1) - gen_math_ops._range(0, rank, 1) else: rank = a.get_shape().ndims perm = (rank - 1) - np.arange(rank, dtype=np.int32) from tensorflow.python.ops import logging_ops perm = logging_ops.Print(perm, [perm], "sdfsdf") — You are receiving this because your review was requested. Reply to this email directly, view it on GitHub <#21945 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAATxfHBtfO86QL9m4uClIzk828SluZRks5ub5tGgaJpZM4WRYE2> .

-- - Alex

alextp · 2018-09-17T16:43:46Z

I suggest this because I think it's not a grappler-related bug but instead an issue where some piece of code downstream behaves differently whether perm is a tensor or not, and print makes it a tensor.

efagerho · 2018-09-17T18:50:37Z

Can you fix this by doing perm = ops.convert_to_tensor((rank - 1) - np.arange(rank, dtype=np.int32))?

Should have mentioned that I already tried this and it doesn't help. In fact, I tried the following things:

1. perm = logging_ops.Print(perm, [perm], "sdfsdf")
2. perm = constant(perm)
3. perm = identity(perm)
4. perm = ops.convert_to_tensor(perm)

Unit tests only pass with (1), the others all fail.

efagerho · 2018-09-17T19:10:10Z

Since tf.Print() is basically tf.identity(), I'm not sure if there could be some strange device placement issues going on here? However, I'm running tests with "--config=opt", so there's really only the CPU to choose from, so I can't see how this could factor in either.

alextp · 2018-09-17T19:31:23Z

@rmlarsen is there someone on the grappler side who can help investigate this failure?

rmlarsen · 2018-10-01T22:45:25Z

@efagerho thanks for the PR and sorry for the delay. Let me take a look.

rmlarsen · 2018-10-01T23:33:06Z

This does appear to be a Grappler bug. The tests pass when I disable all Grappler optimizations. I will hunt down and squash the bug now.

rmlarsen · 2018-10-03T01:40:53Z

I believe this was caused by a bug in the shape function of Transpose. I will submit a fix shortly. Then we should be able to proceed with this PR.

efagerho · 2018-10-03T13:06:11Z

I believe this was caused by a bug in the shape function of Transpose. I will submit a fix shortly. Then we should be able to proceed with this PR.

That's quite unexpected. Would have assumed that code to have been fairly well exercised. Thanks for figuring it out!

rmlarsen · 2018-10-03T17:04:43Z

@efagerho indeed!

rmlarsen · 2018-10-03T18:09:06Z

@efagerho @alextp it looks like fixing the shape function was not enough, and that there is a separate bug in the Grappler shape inference or constant folding. :-(
I'll keep digging.

rmlarsen · 2018-10-03T22:10:36Z

@efagerho @alextp OK found the second bug in reduction index materialization (a part of Grappler constant folding).

rmlarsen · 2018-10-04T22:20:56Z

@efagerho @alextp I have submitted the bugfix for Grappler and we can proceed. I have verified that this change now works, but let's keep it as a PR so you get credited for it.

PiperOrigin-RevId: 215824410

rmlarsen · 2018-10-04T23:47:08Z

@efagerho your PR has now been merged. Thanks for the contribution!

Automated rollback of PR #21945 END_PUBLIC Automated rollback of commit 863f614. Revert #21945. PiperOrigin-RevId: 215913175

efagerho · 2018-10-08T07:20:30Z

Seems like the patch got rolled back. Were the Grappler fixes checked in before the CI ran?

alextp · 2018-10-08T16:47:04Z

We're working on resubmitting it; there were some obscure test failures triggered by this.

Automated rollback of PR tensorflow#21945 END_PUBLIC Automated rollback of commit 863f614. Revert tensorflow#21945. PiperOrigin-RevId: 215913175

googlebot added the cla: yes label Aug 29, 2018

aaroey requested a review from alextp August 31, 2018 05:26

aaroey self-assigned this Aug 31, 2018

alextp approved these changes Sep 4, 2018

View reviewed changes

alextp added awaiting testing (then merge) kokoro:force-run Tests on submitted change ready to pull PR ready for merge process labels Sep 4, 2018

kokoro-team removed the kokoro:force-run Tests on submitted change label Sep 4, 2018

alextp suggested changes Sep 6, 2018

View reviewed changes

tensorflow/python/ops/array_ops.py Outdated Show resolved Hide resolved

efagerho force-pushed the master branch from 0f86b72 to 864e290 Compare September 8, 2018 19:34

alextp added the kokoro:force-run Tests on submitted change label Sep 13, 2018

kokoro-team removed the kokoro:force-run Tests on submitted change label Sep 13, 2018

alextp approved these changes Sep 13, 2018

View reviewed changes

alextp assigned rmlarsen Sep 17, 2018

rmlarsen removed the awaiting testing (then merge) label Oct 4, 2018

rmlarsen added the kokoro:run label Oct 4, 2018

kokoro-team removed the kokoro:run label Oct 4, 2018

tensorflow-copybara merged commit 864e290 into tensorflow:master Oct 4, 2018

tensorflow-copybara pushed a commit that referenced this pull request Oct 4, 2018

Merge pull request #21945 from efagerho:master

863f614

PiperOrigin-RevId: 215824410

tensorflow-copybara pushed a commit that referenced this pull request Oct 5, 2018

BEGIN_PUBLIC

d258207

Automated rollback of PR #21945 END_PUBLIC Automated rollback of commit 863f614. Revert #21945. PiperOrigin-RevId: 215913175

benjamintanweihao pushed a commit to benjamintanweihao/tensorflow that referenced this pull request Oct 12, 2018

BEGIN_PUBLIC

df0676f

Automated rollback of PR tensorflow#21945 END_PUBLIC Automated rollback of commit 863f614. Revert tensorflow#21945. PiperOrigin-RevId: 215913175

benjamintanweihao pushed a commit to benjamintanweihao/tensorflow that referenced this pull request Dec 5, 2018

BEGIN_PUBLIC

4b7e643

Automated rollback of PR tensorflow#21945 END_PUBLIC Automated rollback of commit 863f614. Revert tensorflow#21945. PiperOrigin-RevId: 215913175

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make tf.transpose emit simpler graph when possible #21945

Make tf.transpose emit simpler graph when possible #21945

efagerho commented Aug 29, 2018

efagerho commented Sep 15, 2018 •

edited

efagerho commented Sep 17, 2018 •

edited

efagerho commented Sep 17, 2018

alextp commented Sep 17, 2018 via email

alextp commented Sep 17, 2018

efagerho commented Sep 17, 2018

efagerho commented Sep 17, 2018

alextp commented Sep 17, 2018

rmlarsen commented Oct 1, 2018

rmlarsen commented Oct 1, 2018

rmlarsen commented Oct 3, 2018

efagerho commented Oct 3, 2018

rmlarsen commented Oct 3, 2018

rmlarsen commented Oct 3, 2018

rmlarsen commented Oct 3, 2018

rmlarsen commented Oct 4, 2018

rmlarsen commented Oct 4, 2018 •

edited

efagerho commented Oct 8, 2018

alextp commented Oct 8, 2018

Make tf.transpose emit simpler graph when possible #21945

Make tf.transpose emit simpler graph when possible #21945

Conversation

efagerho commented Aug 29, 2018

efagerho commented Sep 15, 2018 • edited

efagerho commented Sep 17, 2018 • edited

efagerho commented Sep 17, 2018

alextp commented Sep 17, 2018 via email

alextp commented Sep 17, 2018

efagerho commented Sep 17, 2018

efagerho commented Sep 17, 2018

alextp commented Sep 17, 2018

rmlarsen commented Oct 1, 2018

rmlarsen commented Oct 1, 2018

rmlarsen commented Oct 3, 2018

efagerho commented Oct 3, 2018

rmlarsen commented Oct 3, 2018

rmlarsen commented Oct 3, 2018

rmlarsen commented Oct 3, 2018

rmlarsen commented Oct 4, 2018

rmlarsen commented Oct 4, 2018 • edited

efagerho commented Oct 8, 2018

alextp commented Oct 8, 2018

efagerho commented Sep 15, 2018 •

edited

efagerho commented Sep 17, 2018 •

edited

rmlarsen commented Oct 4, 2018 •

edited