You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, I would like to know how to compute the loss between VideoSwin and the CLIP features in latest paper. Since the Swin family models take the patch size as 4x4, however for ViT the patch size is 16. I would like to know how to compute the loss between these two? (i.e., l1 loss)?
Thanks.
The text was updated successfully, but these errors were encountered:
Thanks! Meanwhile, I am curious how do you apply Random Masking and Blockwise Masking together with the ratio of 30%. Is there any code for doing this? Since these two are disjoint masking strategies.
Hi, I would like to know how to compute the loss between VideoSwin and the CLIP features in latest paper. Since the Swin family models take the patch size as 4x4, however for ViT the patch size is 16. I would like to know how to compute the loss between these two? (i.e., l1 loss)?
Thanks.
The text was updated successfully, but these errors were encountered: