You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I had a question in regards to the softmax crossentropy between the logits from f_q and f_k and the value that is y_true. In the pseudo-code, it says that we are measuring the crossentropy between our concatenated logits (the softmax of the positive pair should approach 1, and the 255 negatives should be zeros?). If this is the case, where we maximize the positive logits value (similarity between corresponding spatial locations) and relative to this we make the negative patches dissimilar, so why is it that in the pseudo-code we are minimizing the crossentropy between logits and just zeros? If we were trying to get all values to be zero, wouldn't that just be trying to make the negatives and the positive locations dissimilar? Am I just missing something simple?
Thanks
Edit:
I should clarify, the part of the pseudo-code I am referring to is where we have logits (which are the positive and negative logits concatenated), and we flatten them. The target that is given is torch.zeros(B*S) [0...0]. From my understanding (or misunderstanding), shouldn't the target label be [1, 0...0], where the 1 is the positive similarities and the zeros are the from cosine similarities of the negative pairs?
The text was updated successfully, but these errors were encountered:
I had a question in regards to the softmax crossentropy between the logits from f_q and f_k and the value that is y_true. In the pseudo-code, it says that we are measuring the crossentropy between our concatenated logits (the softmax of the positive pair should approach 1, and the 255 negatives should be zeros?). If this is the case, where we maximize the positive logits value (similarity between corresponding spatial locations) and relative to this we make the negative patches dissimilar, so why is it that in the pseudo-code we are minimizing the crossentropy between logits and just zeros? If we were trying to get all values to be zero, wouldn't that just be trying to make the negatives and the positive locations dissimilar? Am I just missing something simple?
Thanks
Edit:
I should clarify, the part of the pseudo-code I am referring to is where we have logits (which are the positive and negative logits concatenated), and we flatten them. The target that is given is torch.zeros(B*S) [0...0]. From my understanding (or misunderstanding), shouldn't the target label be [1, 0...0], where the 1 is the positive similarities and the zeros are the from cosine similarities of the negative pairs?
The text was updated successfully, but these errors were encountered: