You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The reason why the parameters of BIKE are smaller than the original CLIP ViT-L/14 is that in the BIKE model, we only utilize the vision encoder from CLIP and do not include the parameters of CLIP's text encoder.
In fact, the parameters of visual encoder is 303M for ViT-L/16, which excludes text encoder.
The text was updated successfully, but these errors were encountered:
leexinhao
changed the title
The reason why the parameters of BIKE are smaller than the original CLIP ViT-L/14 is that in the BIKE model, we only utilize the vision encoder from CLIP and do not include the parameters of CLIP's text encoder.
The parameters of BIKE are smaller than the original CLIP ViT-L/14 is that in the BIKE model.
Jun 29, 2023
Originally posted by @whwu95 in #3 (comment)
In fact, the parameters of visual encoder is 303M for ViT-L/16, which excludes text encoder.
The text was updated successfully, but these errors were encountered: