You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The data undergoes self-attention, and then proceeds to cross-attention through layer normalization. Why is it that in the code, the encoder_hidden_states for cross-attention are the embeddings of text and images, rather than the output from the previous self-attention layer?
The text was updated successfully, but these errors were encountered:
The data undergoes self-attention, and then proceeds to cross-attention through layer normalization. Why is it that in the code, the encoder_hidden_states for cross-attention are the embeddings of text and images, rather than the output from the previous self-attention layer?
The text was updated successfully, but these errors were encountered: