You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello, thanks for your great work! I read your paper in detail and find that you've evaluate Qwen-VL in DUE-Benchmark which is not reported in its official paper, like Deepform, KLC, WTQ, TableFact, VisualMRC. I want to know the generation config of Qwen-VL to reproduce your result if possible and convenient!(like do_sample,max_new_tokens,top_p,top_k,length_penalty and so forth~). Can you share it? Sincerely thanks for it !
Additionally, I guess that you may use DUE_evaluator as your evaluate script, isn't it?
The text was updated successfully, but these errors were encountered:
For the first question, we have updated the evaluation code, which includes all the generation config.
We borrowed some of DUE_evaluator's methods, and since we converted them into question-answer formats, we used accuracy rather than F1 for some metrics.
Hello, thanks for your great work! I read your paper in detail and find that you've evaluate Qwen-VL in DUE-Benchmark which is not reported in its official paper, like Deepform, KLC, WTQ, TableFact, VisualMRC. I want to know the generation config of Qwen-VL to reproduce your result if possible and convenient!(like
do_sample
,max_new_tokens
,top_p
,top_k
,length_penalty
and so forth~). Can you share it? Sincerely thanks for it !Additionally, I guess that you may use DUE_evaluator as your evaluate script, isn't it?
The text was updated successfully, but these errors were encountered: