You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello, thank you very much for your work. I would like to ask whether expanding the scale of pre-trained CLIP will help improve the final results? For example, change ViT-B-16 to ViT-L-14. After I changed CLIP to ViT-L-14, the results did not improve, so I don't know if there is a problem with my change or if this change is methodologically useless.
The text was updated successfully, but these errors were encountered:
Thanks for your interest in our work. I'd expect ViT-L-14 to yield better results but hyperparameters might need to be changed. In particular, avoiding overfitting could be more challenging due to the internal dimension of 1024 in ViT-L.
Hello, thank you very much for your work. I would like to ask whether expanding the scale of pre-trained CLIP will help improve the final results? For example, change ViT-B-16 to ViT-L-14. After I changed CLIP to ViT-L-14, the results did not improve, so I don't know if there is a problem with my change or if this change is methodologically useless.
The text was updated successfully, but these errors were encountered: