fisher calculation is based on random initialized linear layer? #7

Tsingularity · 2022-04-13T20:18:02Z

Hi, thanks for the great work!

I noticed that in both the paper and the released code, the fisher mask is calculated based on a randomly initialized linear layer. I am pretty surprised/shocked that this could actually lead to good performance in practice. Since the linear weights are randomly initialized, does that mean the resulting backpropagated gradients for the backbone are also random?

Just curious how do u think of this issue. Thanks!

ylsung · 2022-04-17T21:17:53Z

We were also surprised to find that the performance of the random linear layer is comparable to that of the linear layer that has been fine-tuned for several epochs.

One observation is that, given the backbone model and the randomly initialized output layer, people can typically attain adequate performance by only tuning the backbone model. As a result, we compute the most influential gradients based on the random output projection in the same way.

Tsingularity · 2022-04-19T20:19:20Z

interesting! thanks!

Tsingularity closed this as completed Apr 19, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fisher calculation is based on random initialized linear layer? #7

fisher calculation is based on random initialized linear layer? #7

Tsingularity commented Apr 13, 2022

ylsung commented Apr 17, 2022

Tsingularity commented Apr 19, 2022

fisher calculation is based on random initialized linear layer? #7

fisher calculation is based on random initialized linear layer? #7

Comments

Tsingularity commented Apr 13, 2022

ylsung commented Apr 17, 2022

Tsingularity commented Apr 19, 2022