Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Similarity Preserving KD #221

Closed
hanoonaR opened this issue Apr 2, 2022 · 2 comments
Closed

Similarity Preserving KD #221

hanoonaR opened this issue Apr 2, 2022 · 2 comments

Comments

@hanoonaR
Copy link

hanoonaR commented Apr 2, 2022

Hi,
Thanks for your amazing work.

I have a question regarding the implementation of Similarity Preserving KD loss in

spkd_loss = spkd_losses.sum()

In the implementation, the loss is calculated by taking a frobenius norm over the difference between the square matrixes of teacher and student, and then takes a sum of the this. However, torch.norm would calculate the norm and give a single value for a layer. I am confused why we take the sum over it (if its a single value)?

The paper says the loss is the summation over mean of element-wise squared difference between the two square matrix of different layer pairs. So does the sum correspond to different layers?

Thank you.

@yoshitomo-matsubara
Copy link
Owner

Hi @hanoonaR ,

I think the summation you mentioned above is taken over batch. Say you have training batch size of 32, then the spkd_losses have 32 loss values. So we take sum over the batch first, and if we choose batch_mean as reduction, we reduce the value by dividing it by b**2 as shown in Eq. 4 of the paper.

Also, please use Discussions tab above for questions. As explained here, I want to keep Issues mainly for bug reports.

@hanoonaR
Copy link
Author

hanoonaR commented Apr 4, 2022

Thank you @yoshitomo-matsubara, I'll move it to the discussion.

@hanoonaR hanoonaR closed this as completed Apr 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants