New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Atrous convolution does not preserve tensor shape #4742
Comments
Indeed I have reproduced the problem. I believe @gpapan implemented atrous_conv2d, and might have thoughts on how easy this would be to fix. |
👍 |
1 similar comment
👍 |
Can I work on this? I think, I can solve this. |
That would be great @AnishShah, can you describe what the problem is? |
@AnishShah, do you have any updates? |
I tried, but not able to solve it. Sorry.
…On Feb 15, 2017 2:35 PM, "Andrew Selle" ***@***.***> wrote:
@AnishShah <https://github.com/AnishShah>, do you have any updates?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#4742 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ADB1P8wK8L7VKrPwcRuxAvXN3hPCgxaoks5rclZtgaJpZM4KNQDp>
.
|
I think this may also shrink the dimensions as well? I believe a source of confusion here may be due to varying definitions of atrous convolution depending on the paper being read. Basically, some papers defined atrous convolutions incorrectly when they really meant dilated convolutions. This is explained in Multi-Scale Context Aggregation by Dilated Convolutions with the authors' implementation in https://github.com/fyu/dilation. Also see the related issue #3492. I think what people are hoping for is a new function, or perhaps simply an additional parameter to the atrous function, is the ability to specify a constant scale of the output data so this can behave the same as these papers where the output dimensions are the same as the input, since this is particularly useful for semantic segmentation. I believe this is implemented in @vrv or @tatatodd regarding the TensorFlow API design, if this version of dilated convolution with constant input/output dimensions is supported directly in
|
I'm going to delegate to @gpapan on this one, who knows more about semantic segmentation and atrous conv :). I would suggest we'd need a totally separate function because of potential confusion between atrous and dilated. That being said, perhaps someone just posts a good implementation of it here for now instead of having to add it to the API? (Usually, a good sniff test for adding something to the API is whether it's used / fundamental in a state of the art model for an important problem. Otherwise everything under the sun gets added to the core API and our team can't support it all). |
@ahundt In tensorflow, "atrous convolution" and "dilated convolution" are used as synonyms to mean "dilated convolutions" as in the Multi-Scale Context Aggregation by Dilated Convolutions paper you cited. @AnishShah tf.nn.convolution now provides a more generic interface for atrous convolution for any number of dimensions, and I believe it has slightly more complete shape inference, but there are still cases where it does not infer some of the output shape dimensions even when it could. If you are going to add better shape inference code, I suggest adding it to tf.nn.convolution, as there is separate work underway (see #7545) to make atrous_conv2d simply forward to tf.nn.convolution. To fix it you will need to use set_shape function on the output tensors to set the additional shape information. I think it would be possible to do this inside of with_space_to_batch, specifically on the input_converted tensor and then again on the result_converted tensor. You will unfortunately have to duplicate some of the work done in calculating the shapes for space_to_batch_nd and batch_to_space_nd. The reason is that a tensor can either be constant or non-constant, but not partially constant. |
@jbms Thanks for your comment. Does the current code in with_space_to_batch or #7545 have a mode where the output tensor has the same dimensions as the input tensor? This is the case for conv2d_same in
I think it would be very productive to add a note in with_space_to_batch explaining what the output dimensions would be relative to given input dimensions in as they vary by configuration. Regarding your comment on atrous vs dilated convolutions, I quoted the following from a footnote in the Multi-Scale Context Aggregation by Dilated Convolutions:
Perhaps this is a bit pedantic but if the paper is stating this correctly, wouldn't it mean TensorFlow is mistaken in its use of atrous and dilation as synonyms? This seems to imply that what is described as the atrous algorithm only dilated filter size, while the dilated version can be configured so the output is the same size as the input. |
Okay I answered my own question. Yes, both I made this test and ran it on tf 1.0 which does confirm the original issue with import tensorflow as tf
import numpy as np
input_img_np = np.random.random((1, 256, 256, 1)).astype(np.float32)
kernel = np.random.random((6,6,1,1)).astype(np.float32)
with tf.Session() as sess:
concrete_input_op = tf.constant(input_img_np)
concrete_output_op = tf.nn.convolution(concrete_input_op, kernel, padding='SAME', dilation_rate=np.array([2, 2]))
concrete_output = sess.run(concrete_output_op)
print('convolution + CONCRETE + SAME')
print('concrete_input_op: ', concrete_input_op.get_shape())
print('concrete_output_op: ', concrete_output_op.get_shape())
print('concrete_output:', concrete_output.shape)
assert(concrete_input_op.get_shape() == concrete_output_op.get_shape())
undef_input_op = tf.placeholder(tf.float32, shape=(None, 256, 256, 1))
undef_output_op = tf.nn.convolution(undef_input_op, kernel, padding='SAME', dilation_rate=np.array([2, 2]))
undef_output = sess.run(undef_output_op, feed_dict={undef_input_op: input_img_np})
print('convolution + UNDEF + SAME')
print('undef_input_op: ', undef_input_op.get_shape())
print('undef_output_op: ', undef_output_op.get_shape())
print('undef_output:', undef_output.shape)
# This assert will correctly fail even though the shapes are ok because shapes are only partially known
# assert(undef_input_op.get_shape() == undef_output_op.get_shape())
valid_concrete_input_op = tf.constant(input_img_np)
valid_concrete_output_op = tf.nn.convolution(valid_concrete_input_op, kernel, padding='VALID', dilation_rate=np.array([2, 2]))
valid_concrete_output = sess.run(valid_concrete_output_op)
print('convolution + CONCRETE + VALID')
print('valid_concrete_input_op: ', valid_concrete_input_op.get_shape())
print('valid_concrete_output_op: ', valid_concrete_output_op.get_shape())
print('valid_concrete_output:', valid_concrete_output.shape)
valid_undef_input_op = tf.placeholder(tf.float32, shape=(None, 256, 256, 1))
valid_undef_output_op = tf.nn.convolution(valid_undef_input_op, kernel, padding='VALID', dilation_rate=np.array([2, 2]))
valid_undef_output = sess.run(valid_undef_output_op, feed_dict={valid_undef_input_op: input_img_np})
print('convolution + UNDEF + VALID')
print('valid_undef_input_op: ', valid_undef_input_op.get_shape())
print('valid_undef_output_op: ', valid_undef_output_op.get_shape())
print('valid_undef_output:', valid_undef_output.shape)
# This assert will correctly fail even though the shapes are ok because shapes are only partially known
# assert(undef_input_op.get_shape() == undef_output_op.get_shape())
############################################################################
# Now atrous
concrete_input_op = tf.constant(input_img_np)
concrete_output_op = tf.nn.atrous_conv2d(concrete_input_op, kernel, padding='SAME', rate=2)
concrete_output = sess.run(concrete_output_op)
print('atrous_conv2d + CONCRETE + SAME')
print('concrete_input_op: ', concrete_input_op.get_shape())
print('concrete_output_op: ', concrete_output_op.get_shape())
print('concrete_output_op: ', concrete_output_op.get_shape())
print('concrete_output:', concrete_output.shape)
assert(concrete_input_op.get_shape() == concrete_output_op.get_shape())
undef_input_op = tf.placeholder(tf.float32, shape=(None, 256, 256, 1))
undef_output_op = tf.nn.atrous_conv2d(undef_input_op, kernel, padding='SAME', rate=2)
undef_output = sess.run(undef_output_op, feed_dict={undef_input_op: input_img_np})
print('atrous_conv2d + UNDEF + SAME')
print('undef_input_op: ', undef_input_op.get_shape())
print('undef_output_op: ', undef_output_op.get_shape())
print('undef_output:', undef_output.shape)
# This assert will correctly fail even though the shapes are ok because shapes are only partially known
# assert(undef_input_op.get_shape() == undef_output_op.get_shape())
valid_concrete_input_op = tf.constant(input_img_np)
valid_concrete_output_op = tf.nn.atrous_conv2d(valid_concrete_input_op, kernel, padding='VALID', rate=2)
valid_concrete_output = sess.run(valid_concrete_output_op)
print('atrous_conv2d + CONCRETE + VALID')
print('valid_concrete_input_op: ', valid_concrete_input_op.get_shape())
print('valid_concrete_output_op: ', valid_concrete_output_op.get_shape())
print('valid_concrete_output:', valid_concrete_output.shape)
valid_undef_input_op = tf.placeholder(tf.float32, shape=(None, 256, 256, 1))
valid_undef_output_op = tf.nn.atrous_conv2d(valid_undef_input_op, kernel, padding='VALID', rate=2)
valid_undef_output = sess.run(valid_undef_output_op, feed_dict={valid_undef_input_op: input_img_np})
print('atrous_conv2d + UNDEF + VALID')
print('valid_undef_input_op: ', valid_undef_input_op.get_shape())
print('valid_undef_output_op: ', valid_undef_output_op.get_shape())
print('valid_undef_output:', valid_undef_output.shape)
# This assert will correctly fail even though the shapes are ok because shapes are only partially known
# assert(undef_input_op.get_shape() == undef_output_op.get_shape()) Which produces this output with the additional
|
@ahundt , in TF atrous and dilated convolution mean the same thing. One of the parameters that they accept is There are different ways to implement dilated convolution. TF has it implemented by sampling the input feature map which is described in the Deep Lab paper. The piece of code that you refer to actually uses this implementation under the hood. At the same time, dilated convolution is itself an ordinary convolution -- meaning that if you apply it with the In case of Image Segmentation, dilated convolution is used to make it possible to use weights |
@warmspringwinds Thanks! Got it now. All, sorry I ended up hijacking the issue due to the mismatch between my mental model and the design. At least a test script came from it and I learned something, thanks for the clarifications. :-) |
@ahundt The precise definition of with_space_to_batch is given in the docstring. However, the actual output dimensions depend entirely on the behavior of the underlying |
#8411 doesn't actually fix this issue --- it just adds some documentation, but does not actually improve the static tensor shape information, which is what this issue is about. |
Sorry about that. |
Sorry, that was actually my fault! I meant to write that it resolves a point of confusion discussed in this issue. |
No, it's my fault, I did edit your description to make the PR close this issue. That was a little optimistic. |
Aha didn't realize that, well at least things are set correctly now. |
This seems to be fixed in at least tensorflow version 1.1.0-rc2! |
It has been 14 days with no activity and this issue has an assignee.Please update the label and/or status accordingly. |
Closing since this seems obsolete, but please reopen if it needs attention. |
For an input with an undefined batch size,
atrous_conv2d
emits tensors where all except the final dimension are undefined:(For concrete batch sizes, everything works as expected.)
Tested on
0.10.0rc0
The text was updated successfully, but these errors were encountered: