You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi. Thank you for opening source this wonderful implementation! I have a small question about a code and think it might be a bug.
In these lines, you define slot_mu and slot_log_sigma using register_buffer. If I understand correctly, tensors created via register_buffer won't be updated during training (see here for reference). I also check my trained checkpoints, these two values are indeed the same throughout the training process.
Also, in other slot-attention implementations, they define them as trainable parameters (see PyTorch one and the official one). So I just wonder if this is a bug or intentional behavior?
Update: I didn't observe much performance difference using trainable or fixed mu+sigma. That's very interesting.
The text was updated successfully, but these errors were encountered:
Indeed you're right. Actually I run experiments after fixing it, and the performance difference is very small (<5%). So I think the learned slot initialization distribution is not very important.
Hi. Thank you for opening source this wonderful implementation! I have a small question about a code and think it might be a bug.
In these lines, you define
slot_mu
andslot_log_sigma
usingregister_buffer
. If I understand correctly, tensors created viaregister_buffer
won't be updated during training (see here for reference). I also check my trained checkpoints, these two values are indeed the same throughout the training process.Also, in other slot-attention implementations, they define them as trainable parameters (see PyTorch one and the official one). So I just wonder if this is a bug or intentional behavior?
Update: I didn't observe much performance difference using trainable or fixed mu+sigma. That's very interesting.
The text was updated successfully, but these errors were encountered: