You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thanks for your great work! I have noticed that you mentioned the unsupervised loss is not back-propagated through the main decoder in the paper. From my understanding, this means the trainable parameters are only optimized through the supervised loss?
Can you please help me to figure out where the implementation is?
Many thanks,
The text was updated successfully, but these errors were encountered:
For the main decoder, yes you are correct, the main decoder is not optimized with the unsupervised loss, only the supervised loss. So to summarize, the main decoder: trained with supervised loss, aux. decoders trained with the unsupervised loss, and the encoder trained with both.
The stop gradient over the main decoder helps with two things: 1) avoid collapsing solutions, if we backpropagated through both, the main decoder will collapse since the unsupervised loss will be minimized if the predictions are zeros, 2) the main decoder is only trained on clean inputs, making adaptable to test time since the test time inputs are also clean.
In the implementation, this is done by simply detaching the main decoders outputs here:
Hi Yassine,
Thanks for your great work! I have noticed that you mentioned the unsupervised loss is not back-propagated through the main decoder in the paper. From my understanding, this means the trainable parameters are only optimized through the supervised loss?
Can you please help me to figure out where the implementation is?
Many thanks,
The text was updated successfully, but these errors were encountered: