Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

input size and crop #22

Closed
dinoByteBr opened this issue May 18, 2020 · 10 comments
Closed

input size and crop #22

dinoByteBr opened this issue May 18, 2020 · 10 comments

Comments

@dinoByteBr
Copy link

Thanks a lot for you awesome performing model!
I'm wondering about scaling and random crop, for training you first scale and then crop to 288x288 and thus the tensor has this size (288), what role does then scaling play here and why you talk about 320x320 as input size instead of 288x288?

RescaleT(320), 
RandomCrop(288),

With your latest model update, upscaling supports different ratios, as it looks like for me, or is only squarish input supported or e.g. 640x480 as well?

@xuebinqin
Copy link
Owner

xuebinqin commented May 18, 2020 via email

@dinoByteBr
Copy link
Author

thanks for your detailed answer, I start to be afraid completely miss the point here, sorry if this question is too dumb.
as far as I see it, it doesn't matter with what size RescaleT is called in training, the inputs for net() is always the crop size (288) in net(inputs_v) -> crop happens after scale.

I understood crop is for data aug, but should't be then the test size the same as used for crop?

@xuebinqin
Copy link
Owner

xuebinqin commented May 19, 2020 via email

@dinoByteBr
Copy link
Author

thanks now all is clear and I could reproduce good results with arbitrary size!
Although I still wonder how the model can handle same objects which are just further away or closer, if its not scale invariant. Anyway, I don't bother you anymore, thanks again!

@mgstar1021
Copy link

Thanks for your great works!

I am using this on iOS application and converted to MLModel . When I use MLModel, I want any arbitrary size for input but it seems to support only square size. for example portrait, landscape like (240 * 320, 320 * 300)

I am getting an error with arbitrary size. What is the solution? Is there a problem with converting?

@xuebinqin
Copy link
Owner

xuebinqin commented Jul 19, 2021 via email

@mgstar1021
Copy link

It is safer to resize all the input to 320x320, which will theoretically give better results. Since there are several downsample and upsample operations, your size may trigger some errors in that part. So it is good to show the error, otherwise we cann't give exact solutions.

On Mon, Jul 19, 2021 at 6:59 AM mgstar1021 @.***> wrote: Thanks for your great works! I am using this on iOS application and converted to MLModel . When I use MLModel, I want any arbitrary size for input but it seems to support only square size. for example portrait, landscape like (240 * 320, 320 * 300) I am getting an error with arbitrary size. What is the solution? Is there a problem with converting? — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#22 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADSGORN2ZYJNOR6XRGDYUA3TYOIHHANCNFSM4NEKY46A .
-- Xuebin Qin PhD Department of Computing Science University of Alberta, Edmonton, AB, Canada Homepage:https://webdocs.cs.ualberta.ca/~xuebin/

Thanks for your reply. Is it better to use square image than portrait or landscape image which sets one side(height or width) to 320? Is it difficult to support it?

@xuebinqin
Copy link
Owner

xuebinqin commented Jul 19, 2021 via email

@rdutta1999
Copy link

"but should't be then the test size the same as used for crop?" RES: No. The networks are usually (theoretically) translation invariant but not scale invariant. The cropping mainly changes the translation. But it doesn't change the receptive fields. In both training and test, keeping the scaling consistent is necessary, while cropping isn't. Because most of the networks are not scale invariant. Besides, cropping in testing will introduce another problem. How can we achieve the complete prediction map of the whole input image in the testing process.

On Mon, May 18, 2020 at 3:39 PM dinoByteBr @.***> wrote: thanks for your detailed answer, I start to be afraid completely miss the point here, sorry if this question is too dumb. as far as I see it, it doesn't matter with what size RescaleT is called in training, the inputs for net() is always the crop size (288) in net(inputs_v) -> crop happens after scale. I understood crop is for data aug, but should't be then the test size the same as used for crop? — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#22 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADSGORNLKWQZMJCHGAWXVWTRSGTIJANCNFSM4NEKY46A .
-- Xuebin Qin PhD Department of Computing Science University of Alberta, Edmonton, AB, Canada Homepage:https://webdocs.cs.ualberta.ca/~xuebin/

Hey, first of all, thank you for your work. Eagerly waiting to see your new paper (and model).

Regarding the fact that the input sizes are different for training (288x288 after cropping) and training (320x320 after resizing), you say that scaling has to be consistent since generally models are not scale-invariant.
This brings me to the following questions:-

  1. Since cropping is being applied after resizing, the model gets 288x288 sized input images during training, whereas during testing, the input images are 320x320. Since the images are of different shapes, aren't their scales different (thus leading to a case of scale-variance)?
  2. Wouldn't it be better to apply a percentage based cropping (to account for different dataset images) and then resizing them to 320x320? If this is done, the model would have the same input size (320x320) during both training and testing, thus keeping the scaling consistent.

Once again, thanks a lot for your work. This model is a godsend.
I have been using it for my own background removal module (trained on 720x720 images and L2 loss to predict an alpha matte).

@xiemeilong
Copy link

"but should't be then the test size the same as used for crop?" RES: No. The networks are usually (theoretically) translation invariant but not scale invariant. The cropping mainly changes the translation. But it doesn't change the receptive fields. In both training and test, keeping the scaling consistent is necessary, while cropping isn't. Because most of the networks are not scale invariant. Besides, cropping in testing will introduce another problem. How can we achieve the complete prediction map of the whole input image in the testing process.

On Mon, May 18, 2020 at 3:39 PM dinoByteBr @.***> wrote: thanks for your detailed answer, I start to be afraid completely miss the point here, sorry if this question is too dumb. as far as I see it, it doesn't matter with what size RescaleT is called in training, the inputs for net() is always the crop size (288) in net(inputs_v) -> crop happens after scale. I understood crop is for data aug, but should't be then the test size the same as used for crop? — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#22 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADSGORNLKWQZMJCHGAWXVWTRSGTIJANCNFSM4NEKY46A .
-- Xuebin Qin PhD Department of Computing Science University of Alberta, Edmonton, AB, Canada Homepage:https://webdocs.cs.ualberta.ca/~xuebin/

Hey, first of all, thank you for your work. Eagerly waiting to see your new paper (and model).

Regarding the fact that the input sizes are different for training (288x288 after cropping) and training (320x320 after resizing), you say that scaling has to be consistent since generally models are not scale-invariant. This brings me to the following questions:-

  1. Since cropping is being applied after resizing, the model gets 288x288 sized input images during training, whereas during testing, the input images are 320x320. Since the images are of different shapes, aren't their scales different (thus leading to a case of scale-variance)?
  2. Wouldn't it be better to apply a percentage based cropping (to account for different dataset images) and then resizing them to 320x320? If this is done, the model would have the same input size (320x320) during both training and testing, thus keeping the scaling consistent.

Once again, thanks a lot for your work. This model is a godsend. I have been using it for my own background removal module (trained on 720x720 images and L2 loss to predict an alpha matte).

@xuebinqin I have the same doubts

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants