Skip to content

Inquiry regarding Table 2 Reproduction and Evaluation Prompts #23

@chilouis

Description

@chilouis

Hello,

I am currently working on reproducing the results presented in Table 2 of the DriveBench paper. I have two specific questions regarding the experimental setup:

  1. Data Specification for Inference

Could you clarify whether the inference results for all models were obtained using drivebench-test.json or drivebench-test-final.json?

Additionally, I would appreciate it if you could explain the motivation behind adding the test-final version, particularly for handling single-image cases.

  1. Evaluation Prompt Consistency

I noticed a potential discrepancy between the PERCEPTION_VQA_PROMPT in the repository and the version described in Figure 23 of the paper. Could you please verify this?

Since the paper mentions various prompt types (e.g., rubric-aware, context-aware), could you specify which evaluation prompt was used to generate the results in Table 2?

Thank you for your time and for sharing this valuable research.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions