You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We didn't try the reversed order, but one intuition for the current order is:
All local (e.g., region-level features of an image, word-level features of a sentence) and global representations share the same embedding space where the similarity between data samples is measured, and the GPO directly operates on this unified embedding space. This might make the optimization of the pooling operator easier as the gradients can directly backpropagate from the similarity measurement function to the parameters of GPO, without the potential disturbance from the fc layers.
This is just our intuition and reversing the order might also work well. In terms of the computation, reversing the order indeed reduces the computation but it could be minor compared to the computation of the backbones. After all, an fc layer before the pooling operation is a 1x1 conv, and there are much more conv operations in backbones like ConvNet.
Thanks for your great work.
I notice the paper use the pooling operation after the fc layer (mapping the original feature-dims to embedding dims)
vse_infty/lib/encoders.py
Lines 99 to 104 in c9943b2
Why not use the pooling operation before the fc layer? I think it can reduce computation, or it will bring worse performance? Have you tried it?
The text was updated successfully, but these errors were encountered: