Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fisher calculation is based on random initialized linear layer? #7

Closed
Tsingularity opened this issue Apr 13, 2022 · 2 comments
Closed

Comments

@Tsingularity
Copy link

Hi, thanks for the great work!

I noticed that in both the paper and the released code, the fisher mask is calculated based on a randomly initialized linear layer. I am pretty surprised/shocked that this could actually lead to good performance in practice. Since the linear weights are randomly initialized, does that mean the resulting backpropagated gradients for the backbone are also random?

Just curious how do u think of this issue. Thanks!

@ylsung
Copy link
Collaborator

ylsung commented Apr 17, 2022

We were also surprised to find that the performance of the random linear layer is comparable to that of the linear layer that has been fine-tuned for several epochs.

One observation is that, given the backbone model and the randomly initialized output layer, people can typically attain adequate performance by only tuning the backbone model. As a result, we compute the most influential gradients based on the random output projection in the same way.

@Tsingularity
Copy link
Author

interesting! thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants