Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The order for fc layer and pooling operation #6

Closed
darkpromise98 opened this issue Dec 25, 2021 · 2 comments
Closed

The order for fc layer and pooling operation #6

darkpromise98 opened this issue Dec 25, 2021 · 2 comments

Comments

@darkpromise98
Copy link

Thanks for your great work.

I notice the paper use the pooling operation after the fc layer (mapping the original feature-dims to embedding dims)

vse_infty/lib/encoders.py

Lines 99 to 104 in c9943b2

features = self.fc(images)
if self.precomp_enc_type == 'basic':
# When using pre-extracted region features, add an extra MLP for the embedding transformation
features = self.mlp(images) + features
features, pool_weights = self.gpool(features, image_lengths)

Why not use the pooling operation before the fc layer? I think it can reduce computation, or it will bring worse performance? Have you tried it?

@woodfrog
Copy link
Owner

Thank you for your interest!

We didn't try the reversed order, but one intuition for the current order is:

All local (e.g., region-level features of an image, word-level features of a sentence) and global representations share the same embedding space where the similarity between data samples is measured, and the GPO directly operates on this unified embedding space. This might make the optimization of the pooling operator easier as the gradients can directly backpropagate from the similarity measurement function to the parameters of GPO, without the potential disturbance from the fc layers.

This is just our intuition and reversing the order might also work well. In terms of the computation, reversing the order indeed reduces the computation but it could be minor compared to the computation of the backbones. After all, an fc layer before the pooling operation is a 1x1 conv, and there are much more conv operations in backbones like ConvNet.

@darkpromise98
Copy link
Author

Thanks for your reply.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants