You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I noticed that in both the paper and the released code, the fisher mask is calculated based on a randomly initialized linear layer. I am pretty surprised/shocked that this could actually lead to good performance in practice. Since the linear weights are randomly initialized, does that mean the resulting backpropagated gradients for the backbone are also random?
Just curious how do u think of this issue. Thanks!
The text was updated successfully, but these errors were encountered:
We were also surprised to find that the performance of the random linear layer is comparable to that of the linear layer that has been fine-tuned for several epochs.
One observation is that, given the backbone model and the randomly initialized output layer, people can typically attain adequate performance by only tuning the backbone model. As a result, we compute the most influential gradients based on the random output projection in the same way.
Hi, thanks for the great work!
I noticed that in both the paper and the released code, the fisher mask is calculated based on a randomly initialized linear layer. I am pretty surprised/shocked that this could actually lead to good performance in practice. Since the linear weights are randomly initialized, does that mean the resulting backpropagated gradients for the backbone are also random?
Just curious how do u think of this issue. Thanks!
The text was updated successfully, but these errors were encountered: