-
Notifications
You must be signed in to change notification settings - Fork 111
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Some questions about Paper #2
Comments
We replace the query generator with learnable parameters in the ablation study, which is similar to the practice in DETR. |
Thanks for your reply. So, it seems it's a linear projection without max pooling operation. Is my understanding correct or not? |
Not really, the queries are independent of the outputs of the encoder. In that case, outputs of the encoder are only used as the memory of the decoder (cross attention) and produce the coarse center points, while the queries of the decoder are some pre-defined and learnable parameters. You can get more details from DETR pipeline (End-to-End Object Detection with Transformers) . |
Thanks, I'll check it. |
Hi, Xumin. It's a great paper and I'm inspired a lot. I have a question about the ablation experiment in the paper. When using the baseline test, the result generated by the query generator is replaced, how do the Dynamic Queries of the Transformer decoder be generated? Thanks a lot.
The text was updated successfully, but these errors were encountered: