Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is it a bug that slot_mu and slot_log_sigma is not updated in training? #8

Closed
Wuziyi616 opened this issue Oct 16, 2021 · 2 comments
Closed

Comments

@Wuziyi616
Copy link

Hi. Thank you for opening source this wonderful implementation! I have a small question about a code and think it might be a bug.

In these lines, you define slot_mu and slot_log_sigma using register_buffer. If I understand correctly, tensors created via register_buffer won't be updated during training (see here for reference). I also check my trained checkpoints, these two values are indeed the same throughout the training process.

Also, in other slot-attention implementations, they define them as trainable parameters (see PyTorch one and the official one). So I just wonder if this is a bug or intentional behavior?

Update: I didn't observe much performance difference using trainable or fixed mu+sigma. That's very interesting.

@joshalbrecht
Copy link

I think you are correct that this was a bug, and not an intentional change. Thanks for flagging!

@Wuziyi616
Copy link
Author

Indeed you're right. Actually I run experiments after fixing it, and the performance difference is very small (<5%). So I think the learned slot initialization distribution is not very important.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants