Open
Description
The model configuration is not the same as described in the paper. There is a softmax layer missing at the end of the model. The paper concatenates the attention * vision features for all the glimpses and then pass it through a single linear layer. You use non-linearity both times before and after fusion.
Metadata
Metadata
Assignees
Labels
No labels