Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resubmit "Keras grouped convolutions" #39516

Merged

Conversation

lgeiger
Copy link
Contributor

@lgeiger lgeiger commented May 13, 2020

This PR resubmits the changes from #36773 which were rolled back in dd2ea87.

I couldn't reproduce the failure mentioned in #36773 (comment) locally with or without XLA, so I am resubmitting the changes to see if CI is happy now.
@tanzhenyu could you take a look?

@google-ml-butler google-ml-butler bot added the size:L CL Change Size: Large label May 13, 2020
@gbaned gbaned self-assigned this May 14, 2020
@gbaned gbaned added this to Assigned Reviewer in PR Queue via automation May 14, 2020
@gbaned gbaned requested a review from tanzhenyu May 14, 2020 05:34
@lgeiger
Copy link
Contributor Author

lgeiger commented May 18, 2020

@tanzhenyu Any chance you could approve this so CI can run? Since this is a git revert of the commit that rolled back the merge of #36773, there should be no need for an indepth review again.

@gbaned gbaned added the awaiting review Pull request awaiting review label May 19, 2020
@lgeiger lgeiger force-pushed the resubmit-keras-grouped-conv branch from 2948172 to 5c66ce9 Compare May 22, 2020 22:15
@lgeiger lgeiger changed the title Revert "Keras grouped convolutions rollback" Resubmit "Keras grouped convolutions" May 22, 2020
@lgeiger lgeiger force-pushed the resubmit-keras-grouped-conv branch from 5c66ce9 to bbae7c7 Compare May 22, 2020 22:24
@lgeiger lgeiger force-pushed the resubmit-keras-grouped-conv branch from bbae7c7 to cdeac44 Compare May 22, 2020 22:39
@lgeiger
Copy link
Contributor Author

lgeiger commented May 22, 2020

I rebased the PR and resolved the merge conflicts. @gbaned is there anything that still blocks this?

@tanzhenyu
Copy link
Contributor

tanzhenyu commented May 22, 2020

I'm not sure why it didn't show up in external CI. But here's what we got:

  File "/tensorflow/python/keras/layers/convolutional_test.py", line 296, in test_conv3d
    self._run_test(kwargs, expected_output_shape)
  File "/tensorflow/python/keras/layers/convolutional_test.py", line 265, in _run_test
    expected_output_shape=expected_output_shape)
  File "/tensorflow/python/framework/test_util.py", line 1717, in decorated
    result = f(self, *args, **kwargs)
  File "/tensorflow/python/keras/testing_utils.py", line 227, in layer_test
    model.train_on_batch(input_data, actual_output)
  File "/tensorflow/python/keras/engine/training.py", line 1476, in train_on_batch
    logs = train_function(iterator)
  File "/tensorflow/python/eager/def_function.py", line 766, in __call__
    result = self._call(*args, **kwds)
  File "/tensorflow/python/eager/def_function.py", line 826, in _call
    return self._stateless_fn(*args, **kwds)
  File "/tensorflow/python/eager/function.py", line 2812, in __call__
    return graph_function._filtered_call(args, kwargs)  # pylint: disable=protected-access
  File "/tensorflow/python/eager/function.py", line 1838, in _filtered_call
    cancellation_manager=cancellation_manager)
  File "/tensorflow/python/eager/function.py", line 1915, in _call_flat
    ctx, args, cancellation_manager=cancellation_manager))
  File "/tensorflow/python/eager/function.py", line 549, in call
    ctx=ctx)
  File "/tensorflow/python/eager/execute.py", line 60, in quick_execute
    inputs, attrs, num_outputs)
tensorflow.python.framework.errors_impl.UnimplementedError:  Hit a case for convolution that is not implemented on GPU.
	 [[{{node cluster_131_1/xla_compile}}]] [Op:__inference_train_function_7135]

Either this doesn't pass the GPU test, or doesn't pass XLA test

@lgeiger
Copy link
Contributor Author

lgeiger commented May 23, 2020

@tanzhenyu Thanks for taking another look, very much appreciated.

After some digging I was able to run the tests with XLA on a cloud VM and could reproduce the failure locally. It looks like currently train_on_batch fails for 3D grouped convolutions when XLA is enabled.
Since the underlying failure is not related to this PR and would also appear when using grouped convolutions with tf.nn.conv3d on master, I disbled the relevant part of the test in 215161f. Please let me know if that works for now.

@tensorflowbutler tensorflowbutler removed the awaiting review Pull request awaiting review label May 25, 2020
PR Queue automation moved this from Assigned Reviewer to Approved by Reviewer May 26, 2020
@google-ml-butler google-ml-butler bot added kokoro:force-run Tests on submitted change ready to pull PR ready for merge process labels May 26, 2020
@kokoro-team kokoro-team removed the kokoro:force-run Tests on submitted change label May 26, 2020
@lgeiger
Copy link
Contributor Author

lgeiger commented May 26, 2020

Thanks for approving, looks like CI is also happy this time 💚

@tanzhenyu
Copy link
Contributor

Thanks for approving, looks like CI is also happy this time 💚

Great, thanks for fixing!

@google-ml-butler google-ml-butler bot added the kokoro:force-run Tests on submitted change label May 26, 2020
@kokoro-team kokoro-team removed the kokoro:force-run Tests on submitted change label May 26, 2020
@tensorflow-copybara tensorflow-copybara merged commit e0dc15c into tensorflow:master May 29, 2020
@lgeiger lgeiger deleted the resubmit-keras-grouped-conv branch May 29, 2020 19:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla: yes ready to pull PR ready for merge process size:L CL Change Size: Large
Projects
PR Queue
  
Approved by Reviewer
Development

Successfully merging this pull request may close these issues.

None yet

8 participants