You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thank you for sharing your excellent work.
I have some questions about the pipeline.gif. In the loss function, video features and text features will form a matrix. Some of matrices are diagonal matrices and some are symmetric matrices. I don't understand the matrix type.
I think the symmetric matrix is video and prompt one by one, and the diagonal matrix is multiple videos correspond to a prompt text. For example, the loss of stat is a symmetric matrix in the pipeline. I think the positive pair of z[CNT] is only one, so Lstat should be a diagonal matrix. Predicted statistics should correspond to the prompt statistics one by one.
I look forward to your reply. Thanks.
The text was updated successfully, but these errors were encountered:
Thank you for sharing your excellent work.
I have some questions about the pipeline.gif. In the loss function, video features and text features will form a matrix. Some of matrices are diagonal matrices and some are symmetric matrices. I don't understand the matrix type.
I think the symmetric matrix is video and prompt one by one, and the diagonal matrix is multiple videos correspond to a prompt text. For example, the loss of stat is a symmetric matrix in the pipeline. I think the positive pair of z[CNT] is only one, so Lstat should be a diagonal matrix. Predicted statistics should correspond to the prompt statistics one by one.
I look forward to your reply. Thanks.
The text was updated successfully, but these errors were encountered: