You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hot take here 🔥. NaViT may have allowed to handle images with varied aspect ratios but it did not fix handling arbitrary resolutions. For this, inter/extrapolation is still needed. Fractional Factorized positional embeddings (hight and width) are initialized as learnable 1-dimensional vectors of fixed size. So if one of the dimensions of the input image exceeds this fixed size there will be an indexing error. Maybe Im wrong, but this is what it looks like to me looking at some publicly available implementations 🤷🏻♂️. Would love some input on this, its driving me crazy 🤯.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Hot take here 🔥. NaViT may have allowed to handle images with varied aspect ratios but it did not fix handling arbitrary resolutions. For this, inter/extrapolation is still needed. Fractional Factorized positional embeddings (hight and width) are initialized as learnable 1-dimensional vectors of fixed size. So if one of the dimensions of the input image exceeds this fixed size there will be an indexing error. Maybe Im wrong, but this is what it looks like to me looking at some publicly available implementations 🤷🏻♂️. Would love some input on this, its driving me crazy 🤯.
Beta Was this translation helpful? Give feedback.
All reactions