New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Centered batch padding with tf.data.Dataset API #13969

Closed
jpuigcerver opened this Issue Oct 25, 2017 · 15 comments

Comments

Projects
None yet
9 participants
@jpuigcerver
Contributor

jpuigcerver commented Oct 25, 2017

Hi,

First of all, kudos for the PaddedBatchDataset op and all the tf.data.Dataset API.

Right now, all the examples in the batch are aligned at coordinate (0, 0, ...). I think it is very convenient in many applications (e.g. image processing) to have the examples "centered" in the batch.

I would like to know if you would accept a pull request in that direction. If so, I am willing to work on this. I suggest to add a flag to the operator so that the user can decide whether she wants the data centered, or aligned at (0, 0, ...), i.e. the current behavior. This changes should not be difficult to make.

@michaelisard

This comment has been minimized.

Member

michaelisard commented Oct 25, 2017

@mrry what do you think?

@mrry

This comment has been minimized.

Contributor

mrry commented Oct 26, 2017

If it would be useful, we could accept a custom transformation function in tf.contrib.data to add this feature (similar to how tf.contrib.data.batch_and_drop_remainder() or tf.contrib.data.dense_to_sparse_batch() are implemented). It might be good to discuss the proposed interface in this issue before going ahead with the implementation, because I imagine there could be other padding options, e.g. inspired by the features in tf.pad(), and it would be nice to come up with a generally useful transformation.

/cc @ebrevdo, a notorious padder, who might have additional desires in this regard.

@jpuigcerver

This comment has been minimized.

Contributor

jpuigcerver commented Oct 27, 2017

Indeed, there are different ways of padding. I think that in most of cases, especially when your operations are able to deal with variable sizes within each batch, you are just fine with the current features of PaddedBatchDataset.

However, when some of your operations assume that all the examples in the minibatch have the same shape (e.g. LSTM wrapper for cudnn's LSTM), it is very useful to align the examples to share the same center coordinates. This is actually the problem that I'm facing now.

Some practical use cases:
A dataset containing three tensors of different sizes: [ [1, 1, 1, 1], [2, 2], [3] ]

  • Current padding:
dataset.padded_batch(2, [None])

would produce:

[ [1, 1, 1, 1],
  [2, 2, 0, 0],
  [3, 0, 0, 0] ]
  • Centered padding:
dataset.padded_batch(2, [None], centered=True)

would produce something like:

[ [1, 1, 1, 1],
  [0, 2, 2, 0],
  [0, 3, 0, 0] ]

Of course one can think of many other alignment strategies. For instance, aligning to the right. However, this can be generally implemented with the current padding and pre/post map() operations.

However relative alignments (the centered case that I just showed is a special case of a relative alignment) cannot be implemented with the current API, but I can't think of any practical case where handling generic relative alignments would be useful.

That's why I suggested to simply add a boolean flag to the PaddedBatchDataset class to deal with the centering. This will keep the API clean, and would cover most practical use cases that I can think of.

@michaelisard

This comment has been minimized.

Member

michaelisard commented Oct 27, 2017

@ebrevdo would you let us know if you have any comments before @jpuigcerver starts working on a patch?

@ebrevdo

This comment has been minimized.

Contributor

ebrevdo commented Oct 27, 2017

@jpuigcerver

This comment has been minimized.

Contributor

jpuigcerver commented Oct 27, 2017

Hi @ebrevdo,

To give some perspective, I've been using this trick for handwritten text recognition for a while (you can also think of speech recognition, as an application potentially affected by this). The main reason for this is that, as it happens with TF, the Torch's wrapper to cudnn LSTM does not support examples of multiple lengths. So, after padding one must assume that all examples have the same size.

This, by itself, is not a big issue. The problems arise when one uses bidirectional LSTMs and the length of the sequences varies a lot within each batch.

Notice that the left-to-right LSTM has to deal with lots of zeros in the input at the end of the sequence, while the right-to-left LSTM presents this problem at the start of the sequence. Centering the examples within the batch mitigates this problem.

In terms of the loss (CTC), I just ignore the fact that I padded the sequences. In practice, this means that the model has to learn to be invariant to the amount of "zeros" at the beginning and the end of the examples.

Ideally, I would like that all recurrent operations support examples of multiples sizes within the batch, but that requires a significant effort from many other developers. In any case, I found that this simple trick of centering the features within the batch works quite well in practice, so it's a good thing to have this option available if some recurrent op does not support multiple layers.

Regarding the PaddedBatchDataset operation in TF, we could add an output_offset attribute, similar to output_types and output_shapes that contains the offset of each element within the batch. The default (left-aligned padding) would be a zeros tensor. This, and the output_shapes would be enough if the user wants to take into account the padding for any masking in later ops.

@ebrevdo

This comment has been minimized.

Contributor

ebrevdo commented Oct 27, 2017

@tensorflowbutler

This comment has been minimized.

Member

tensorflowbutler commented Dec 20, 2017

It has been 14 days with no activity and the awaiting tensorflower label was assigned. Please update the label and/or status accordingly.

1 similar comment
@tensorflowbutler

This comment has been minimized.

Member

tensorflowbutler commented Jan 4, 2018

It has been 14 days with no activity and the awaiting tensorflower label was assigned. Please update the label and/or status accordingly.

@mrry

This comment has been minimized.

Contributor

mrry commented Jan 4, 2018

Closing due to lack of activity. @jpuigcerver Feel free to reopen if you pick this up again.

@mrry mrry closed this Jan 4, 2018

@StphMe

This comment has been minimized.

StphMe commented Mar 6, 2018

I encounter the same issue currently but not only for LSTMS.

When you train an arbitrary convolutional network, that needs fixed image size inputs (i.e. it has a dense layer inside) with images from a dataset object, you can use the function dataset.padded_batch with None or -1 at the corresponding places to pad your data with some constant values according to the biggest dimension in the batch. However the type of padding is not customizable. As described above, the values (i.e. zeros) are places on the right and the bottom of the image:

A picture with
[[1,1,1,1]
[1,1,1,1]
[1,1,1,1]
[1,1,1,1]]

padded by 2 would look like:
[[1,1,1,1,0,0]
[1,1,1,1,0,0]
[1,1,1,1,0,0]
[1,1,1,1,0,0]
[0,0,0,0,0,0]
[0,0,0,0,0,0]]

The problem now is, that if you train a network with images padded like this, that net will learn eventually that the right and bottom part of the input is not very contributing to the classification or labeling of the image, so the weights referring to the receptive field of the black border will be more neglected by the net.
During inference time, we the classifier will be presented only with image without this border, which means, that we'll get worse accuracy if important image features lay on the right and bottom side of the picture.
This effect could be minimized in 2 possible ways:
1.: Center padding.
i.e. creating a picture like this:
[[0,0,0,0,0,0]
[0,1,1,1,1,0]
[0,1,1,1,1,0]
[0,1,1,1,1,0]
[0,1,1,1,1,0]
[0,0,0,0,0,0]]
In this "centered" padded image the receptive field of neurons will see more of the picture at the borders and therefore reduce the described effect.
Note: this is similar to the suggestion of jpuigcerver above.

2.: Mirror padding
The image is padded as before to the right and bottom if necessary. But the values are not constant but mirrored from the input image.
i.e for an input picture:
[[1,2,3,4]
[1,2,3,4]
[1,2,3,4]
[2,2,2,4]]
mirrored batch would create a picture like:
[[1,2,3,4,4,3]
[1,2,3,4,4,3]
[1,2,3,4,4,3]
[2,2,2,4,4,2]
[2,2,2,4,4,2]
[2,2,2,4,4,2]]

I think these features would really help improving the output of the network when feeding it padded images.

I would appreciate to reopen this issue.

@nikste

This comment has been minimized.

Contributor

nikste commented Mar 6, 2018

+1 I also need this feature!

@jpuigcerver

This comment has been minimized.

Contributor

jpuigcerver commented Mar 6, 2018

Hi,

I'm so sorry for being silent for so long. Unfortunately, I'm writing my thesis now and I don't have much time for writing code. If anyone wants to work on this, I (and apparently others) think it's a very useful feature. If nobody has implemented it yet, I'll come back to this once I defend my thesis.

@anjany

This comment has been minimized.

anjany commented Aug 17, 2018

Hi @StphMe and all: did you, by any chance, figure out some hack for doing this using the existing API?

@StphMe

This comment has been minimized.

StphMe commented Aug 20, 2018

Hi @anjany, unfortunately not. I also had no time to contribute. You may try to create a work around that works for your exact case using the tf.pad() function, which supports different types of padding: https://www.tensorflow.org/api_docs/python/tf/pad

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment