You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, actually my model implementation strictly follows the one in paper.
If you look into PyTorch's TransformerEncoderLayer implementation, you will find it is in the order: self_attn->residual->norm->pointwise_ff->residual->norm. However in End-to-End Neural Speaker Diarization with Self-attention/Fig. 2, the encoder block is defined as: norm->self_attn->residual->norm->pointwise_ff->residual, and with a layer_norm at the end (before linear+sigmoid).
Thus, applying layer_norm at the beginning in PyTorch code is equal to what they have done in their paper.
Hi,
In End-to-End Neural Speaker Diarization with Self-attention/Fig. 2., LayerNorm was applied after the
Encoder block
s, but in your implementation, the order was reversed. Are there any particular reasons for that?Have a good day.
The text was updated successfully, but these errors were encountered: