Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Deeplab] input_preprocess doesn't give the correct shape in evaluation mode #3939

Closed
jrabary opened this issue Apr 10, 2018 · 8 comments
Assignees

Comments

@jrabary
Copy link

@jrabary jrabary commented Apr 10, 2018

Describe the problem

Here is a snippet from the input_preprocess.py in deeplab model (from line 122)

# Randomly crop the image and label.
  if is_training and label is not None:
    processed_image, label = preprocess_utils.random_crop(
        [processed_image, label], crop_height, crop_width)

  processed_image.set_shape([crop_height, crop_width, 3])

  if label is not None:
    label.set_shape([crop_height, crop_width, 1])

  if is_training:
    # Randomly left-right flip the image and label.
    processed_image, label, _ = preprocess_utils.flip_dim(
        [processed_image, label], _PROB_OF_FLIP, dim=1)

Obviously the crop is only performed during the train mode but the processed_image shape is set to [crop_height, crop_width].

This cause a problem when we evaluate the xception65 model which produces the following error

InvalidArgumentError (see above for traceback): padded_shape[0]=45 is not divisible by block_shape[0]=2
	 [[Node: xception_65/exit_flow/block2/unit_1/xception_module/separable_conv1_depthwise/depthwise/SpaceToBatchND = SpaceToBatchND[T=DT_FLOAT, Tblock_shape=DT_INT32, Tpaddings=DT_INT32, _device="/job:localhost/replica:0/task:0/device:CPU:0"](xception_65/exit_flow/block1/unit_1/xception_module/add, xception_65/exit_flow/block2/unit_1/xception_module/separable_conv1_depthwise/depthwise/SpaceToBatchND/block_shape, xception_65/exit_flow/block2/unit_1/xception_module/separable_conv1_depthwise/depthwise/SpaceToBatchND/paddings)]]

If we force the size of the input image to be [513, 513] it works.

This test was done with Pascal VOC data set.

@aquariusjay

This comment has been minimized.

Copy link
Contributor

@aquariusjay aquariusjay commented Apr 10, 2018

During eval, we always do whole-image inference, meaning you need to set eval_crop_size >= largest image dimension.

We always set crop_size = output_stride * k + 1, where k is an integer. When working on PASCAL images, the largest dimension is 512. Thus, we set crop_size = 513 = 16 * 32 + 1 > 512. Similarly, we set eval_crop_size = 1025x2049 for Cityscapes images.

@jrabary

This comment has been minimized.

Copy link
Author

@jrabary jrabary commented Apr 10, 2018

Thanks @aquariusjay. We follow exactly your parameters on Pascal VOC. The crop size is set to 513. We notice that in the pre-processing code the image is randomly scaled even in eval mode. Is that correct ? After this data augmentation, the image size can be for example [670 1000 3] and this causes the error on xception65 forward function.

@aquariusjay

This comment has been minimized.

Copy link
Contributor

@aquariusjay aquariusjay commented Apr 10, 2018

If you need multi-scale inputs during inference, please call this function
https://github.com/tensorflow/models/blob/master/research/deeplab/model.py#L91
And this should have already been handled in eval.py
https://github.com/tensorflow/models/blob/master/research/deeplab/eval.py#L112

Do not use the pre-processing for multi-scale inputs during inference.

@jrabary

This comment has been minimized.

Copy link
Author

@jrabary jrabary commented Apr 11, 2018

This problem appears when we perform single-scale test. We do not explicitly call pre-processing function during the evaluation, it is called in the get of input_generator

original_image, image, label = input_preprocess.preprocess_image_and_label(

And if you take a look at this function, the data augmentation is always performed even in eval mode

# Data augmentation by randomly scaling the inputs.

We believe that this can be problematic, and in fact during the evaluation the input image can have a shape that is not compatible with the xception65 network.

@aquariusjay

This comment has been minimized.

Copy link
Contributor

@aquariusjay aquariusjay commented Apr 11, 2018

During inference, there is no need to do any data augmentation. You could simply set min_scale_factor = max_scale_factor = 1, which is what we do in the provided examples.

Also, if you really think it is a problem, you could add if is_training before those preprocessing functions.

@jrabary

This comment has been minimized.

Copy link
Author

@jrabary jrabary commented Apr 13, 2018

Adding if is_training is finally what we did and we get relatively the same result as yours. Thanks for answering.

@jrabary jrabary closed this Apr 13, 2018
@95xueqian

This comment has been minimized.

Copy link

@95xueqian 95xueqian commented May 29, 2018

how to set k? @aquariusjay

@manasb26

This comment has been minimized.

Copy link

@manasb26 manasb26 commented Oct 15, 2019

k is basically the output size of the feature from any feature extractor network. For example, by setting output_stride = 16 with an input image size as 512, we get k as 512/16 = 32.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants
You can’t perform that action at this time.