Similarity Preserving KD #221

hanoonaR · 2022-04-02T08:55:28Z

Hi,
Thanks for your amazing work.

I have a question regarding the implementation of Similarity Preserving KD loss in

torchdistill/torchdistill/losses/single.py

Line 467 in 7f533ba

spkd_loss = spkd_losses.sum()

In the implementation, the loss is calculated by taking a frobenius norm over the difference between the square matrixes of teacher and student, and then takes a sum of the this. However, torch.norm would calculate the norm and give a single value for a layer. I am confused why we take the sum over it (if its a single value)?

The paper says the loss is the summation over mean of element-wise squared difference between the two square matrix of different layer pairs. So does the sum correspond to different layers?

Thank you.

The text was updated successfully, but these errors were encountered:

yoshitomo-matsubara · 2022-04-03T00:23:33Z

Hi @hanoonaR ,

I think the summation you mentioned above is taken over batch. Say you have training batch size of 32, then the spkd_losses have 32 loss values. So we take sum over the batch first, and if we choose batch_mean as reduction, we reduce the value by dividing it by b**2 as shown in Eq. 4 of the paper.

Also, please use Discussions tab above for questions. As explained here, I want to keep Issues mainly for bug reports.

hanoonaR · 2022-04-04T06:20:16Z

Thank you @yoshitomo-matsubara, I'll move it to the discussion.

hanoonaR closed this as completed Apr 4, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Similarity Preserving KD #221

Similarity Preserving KD #221

hanoonaR commented Apr 2, 2022 •

edited

Loading

yoshitomo-matsubara commented Apr 3, 2022

hanoonaR commented Apr 4, 2022

Similarity Preserving KD #221

Similarity Preserving KD #221

Comments

hanoonaR commented Apr 2, 2022 • edited Loading

yoshitomo-matsubara commented Apr 3, 2022

hanoonaR commented Apr 4, 2022

hanoonaR commented Apr 2, 2022 •

edited

Loading