The evaluate setting of Qwen-VL #7

Coobiw · 2023-11-22T08:01:59Z

Hello, thanks for your great work! I read your paper in detail and find that you've evaluate Qwen-VL in DUE-Benchmark which is not reported in its official paper, like Deepform, KLC, WTQ, TableFact, VisualMRC. I want to know the generation config of Qwen-VL to reproduce your result if possible and convenient!(like do_sample,max_new_tokens,top_p,top_k,length_penalty and so forth~). Can you share it? Sincerely thanks for it !
Additionally, I guess that you may use DUE_evaluator as your evaluate script, isn't it?

The text was updated successfully, but these errors were encountered:

MelosY · 2023-11-22T08:46:27Z

For the first question, we have updated the evaluation code, which includes all the generation config.
We borrowed some of DUE_evaluator's methods, and since we converted them into question-answer formats, we used accuracy rather than F1 for some metrics.

Yuliang-Liu closed this as completed Apr 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The evaluate setting of Qwen-VL #7

The evaluate setting of Qwen-VL #7

Coobiw commented Nov 22, 2023 •

edited

Loading

MelosY commented Nov 22, 2023

The evaluate setting of Qwen-VL #7

The evaluate setting of Qwen-VL #7

Comments

Coobiw commented Nov 22, 2023 • edited Loading

MelosY commented Nov 22, 2023

Coobiw commented Nov 22, 2023 •

edited

Loading