Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The evaluate setting of Qwen-VL #7

Closed
Coobiw opened this issue Nov 22, 2023 · 1 comment
Closed

The evaluate setting of Qwen-VL #7

Coobiw opened this issue Nov 22, 2023 · 1 comment

Comments

@Coobiw
Copy link

Coobiw commented Nov 22, 2023

Hello, thanks for your great work! I read your paper in detail and find that you've evaluate Qwen-VL in DUE-Benchmark which is not reported in its official paper, like Deepform, KLC, WTQ, TableFact, VisualMRC. I want to know the generation config of Qwen-VL to reproduce your result if possible and convenient!(like do_sample,max_new_tokens,top_p,top_k,length_penalty and so forth~). Can you share it? Sincerely thanks for it !
Additionally, I guess that you may use DUE_evaluator as your evaluate script, isn't it?

@MelosY
Copy link
Collaborator

MelosY commented Nov 22, 2023

For the first question, we have updated the evaluation code, which includes all the generation config.
We borrowed some of DUE_evaluator's methods, and since we converted them into question-answer formats, we used accuracy rather than F1 for some metrics.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants